This article is based on the thread: microsoft.public.sqlserver.programming Monday, October 22, 2007 "select rows that contain both arg1 = 'x' and arg1 = 'y'" http://tinyurl.com/386lbx >I need to select only rows where the name contains x and y. I do not want >to get rows that only contain x or contain x, z, or contain y, z. I only >want rows where the name has an x an a y (they can have a z but they must >have an x and a y). Forget QA or SSMS. Just draw a picture. You use the term row. So draw a row. row{aName as sName, acode as code} Now you want people who have code x and y. These are two rows. So draw them. Use 'burns' for the person. row{'burns' as sName, 'x' as code} row{'burns' as sName, 'y' as code} Now what do you call two rows together? How about calling it a table ☺ . Draw it. table { row{'burns' as sName, 'x' as code} row{'burns' as sName, 'y' as code} } Now if this table, made just for 'burns', is in your #tmp1 table then 'burns' is a guy you want. In other words, if both rows (one for x, one for y) are in #tmp1 then you have a hit. Super simple ☺ . Now in QA or SSMS do this. declare @x int, @y int set @x=2 set @y=3 if @x<=@y print 'Yes' else print 'No' No explanation necessary. Super simple. Now take the same idea of comparing two integers and extend it to comparing two tables. if table { row{'burns' as sName, 'x' as code} row{'burns' as sName, 'y' as code} } <= #tmp1 print 'Yes' else 'No' In other words, does each row for 'burns' occur in #tmp1? 'Burns' can have codes in #tmp1 in addition to x and y (ie. 2<=3) So 'burns' has to have at least a row for x and a row for y in #tmp1. If in #tmp1 'burns' has only an 'x' or only a 'y' no matter what other codes he has that's no good (2<=1). In the case above you will see 'Yes' printed since the comparison is true. This whole scenario is referred to as relational division in database terminology. But these simple ideas are obscured by sql because you can't draw a picture of a row, nor a table nor does sql understand comparing tables like integers. So instead you're left with grouping and counting, joins, intersects, existential queries and whatnot all trying to express a simple idea yet at the same time obscuring it. Now in a query you want to substitute all the unique names from #tmp1 into our little table so for each person we can test the comparison with #tmp1. What would such a query look like? select select distinct sName as aPerson from #tmp1 where -- draw a table with two rows for each aPerson -- the 1st column has a value aPerson and -- the column is named 'sName'. The 2nd column is -- called 'code'. The column names and datatypes -- are the same as in #tmp1. table { row{aPerson as sName, 'x' as code}, row{aPerson as sName, 'y' as code} } -- Compare the tables. <= -- Form a table from #tmp1 of rows belonging to the -- aPerson above. (Tmp1 where sName=aPerson); Now this won't quite work in sql no matter where you execute it ☺ . But what would a query really look like that will work with #tmp1. Here it is. And it really is almost self-explanatory. And it works in the D4 language! :) (Tmp1 is a table same as #tmp1 stored in an Sql Server 2005 database). select Tmp1 {sName aPerson} where table { row{aPerson sName, 'x' code}, row{aPerson sName, 'y' code} } <= (Tmp1 where sName=aPerson); aPerson ------- burns jones smith Now this is what MS should be doing. Sql is a language of choice for some things. But it certainly is not the choice language for others.
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Monday, October 22, 2007
Sql - Really simple division
Sunday, October 21, 2007
Sql Server - The Undefined Trigger
Ask the question: does Sql Server have a row level trigger. Answer: it depends. From the point of view of application development how in the world can the answer possibly be ambiguous? Well lets back up and look at the Update statement. From Sql Server 2005 bol: Using UPDATE with the FROM Clause 'The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic'. Ok what is really the rational for even keeping this 'proprietary' syntax? Since it is inherently non-deterministic why even offer it to developers most of whom do not understand but the simplest of sql. After all Sql Server is explicitly relinquishing responsibility for the integrity of the Update and making the user responsible. Shouldn't the idea of a database be the other way around? Ok now lets move to the trigger. From bol under the topic: Multirow Considerations for DML Triggers: 'When you write the code for a DML trigger, consider that the statement that causes the trigger to fire can be a single statement that affects multiple rows of data, instead of a single row.' Does this sound familiar? Of course it does. The same non-determinism of the FROM clause in Update now reappears in the insert table of the trigger. But of course it is not presented as a 'trigger is inherently undefined'. No, now the undefined nature of the trigger is called 'a Multirow consideration'. And again the server reliquishes responsibility of integrity and puts it in the hands of the user. Well lets get real. There is no such thing as a multirow trigger. From an applications standpoint the largest scope of a trigger is a row. The so called multirow trigger is an example of what happens when users don't scrutinize what nonsense is thrown their way. Instead of getting rid of a stupid idea in Update using a FROM it was instead extended to a trigger. It was simply an easy and expedient thing to do. But it was shameful and lazy too. Lack of integrity is based on lack of character. But it is equally disappointing that more users do not complain and demand the integrity that a database should offer. Sql Server can do much, much better. But, like E.T., users must phone home.
Wednesday, October 17, 2007
Sql - Constantly struggling
The subject of this article is called (by others) database constants. So lets be sure we're on the same page with what a constant is. It is just a value. X = Substring(Y,1,5); X and Y are variables, 1 and 5 are values. X = Y * 5; X and Y are variables, 5 is a constant. It is a number whose value is 5. A table in D4 is just as much a variable as X and Y. A table in sql is like the value 5, a constant. The difference between a constant and a variable is also the difference between D4, a relational database, and sql! The following is part of an exchange I had with someone on the issue of database constants. The idea of database constants was raised in the article: A New (and Hopefully Better) Approach to Constants http://www.sqlservercentral.com/articles/T-SQL/61244/ Here I go ☺ : Sql Server lets users define a 'table variable'. Now you really don't believe it's a table 'variable' do you?:) Of course not. They're just playing with words. There is no such thing in sql server as a table variable. But because they call it that most users really believe it. It's like believing in the tooth fairy:) Now we're going to play the same game with database 'constants':) There is no such thing as a database constant. There are only values (which can't be updated or assigned other values) and variables (which can be updated and assigned other values). But in sql there is no real talk of these basic things. And that's why I talk about Dataphor ☺. There is no difference between what this guy is calling a 'constant' and a table or 'resultset' in sql. They are all (constant) values and not variables. I'm trying to show what sql would be like if there really were table 'variables'. One way was with the CTE article, 'Do you know what the Sql CTE really is?' But people seem to be having a hard time wrapping their head around it:) This sql stuff just gets crazier. Now people are going to believe that you can pass a table as a parameter to a stored procedure. That is nonsense. One reason people believe it is they have nothing to compare/constrast it with. So I show what it really means to pass a table to a sp: http://beyondsql.blogspot.com/2007/10/sql-whats-really-with-with.html http://beyondsql.blogspot.com/2007/06/dataphor-13-passing-table-as-parameter.html http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html etc.etc. But to do these kind of things requires a foundation different than sql. Hence the name of my site ☺ . The so called 'constants' are basically a table without a key, ie a row. So I do: create table Y { A:Integer, B:String, key{} }; insert row{1 A,'Micheal' B} into Y; select Y[].B 'Michael' select Y[].A 1 where the '[]' is called a row extracter. Or I can define a 'list': var LList:=list(String){'Michael','Steve','Jimmy'}; select LList[0]; Michael Or I can define a table (ie a row) with lists. create table Z { Strings:list(String), //The column holds a list of strings. Integers:list(Integer),//The column holds a list of integers. key{} }; insert row{{'Micheal','Steve','Jimmy'} Strings,{2,10,22,40} Integers} into Z; select Z[].Strings[2]; Jimmy select Z[].Integers[1]; 10 where the [2] and [1] refer to the ordinal position of the item (value) in the list. But sql doesn't have a table 'type' or a list type so it couldn't understand these things. The only thing sql can do is try to simulate these types. And it winds up in a mess of gooblygook that few understand and robs the user of the integrity that the database should provide by supporting these things directly. All this stuff is based on values, variables and types. The only types that exist is sql are numbers and strings. Again there is no type for tables, row or list. And therefore no variables for these things because without a type you can't define a variable. Make sense ☺ . var X:= Orders //X is a variable of type table, specifically the type //of the Orders table (the column names and there data types). This is the huge step forward from sql. In sql the result of a query is no different than the 'Constant' that guy was trying to talk about ☺ . It is also the difference between a CTE and a variable that holds the result of the 'CTE'. You can't reuse a CTE in sql because it's a 'constant', a value. You can only reuse a variable.
Dataphor - Merging strings together
This problem was raised in the thread: microsoft.public.sqlserver.programming Monday, October 15, 2007 'Parse and Merge two fields' http://tinyurl.com/2eosa3 The idea is to take two pipe ('|') delimited strings and created one string where each item in the first string is matched with the item in the second string having the same ordinal position. The two columns are People and Position. So given: People:=' Sam | Jane | Gene' Position:= 'Accounting | Finance | Marketing' we want: People_Position:= 'Sam|Accounting|Jane|Finance|Gene|Accounting' Of course in sql this is usually a mess since sql does not support a list type and any of the operations that can transform a delimited string to a list and visa versa. In D4 we simply split the delimited string to a list of items and then create a table from the list. The table will contain the ordinal position as an integer value of each item in the list. So a row has the item (string) value and sequence value of that item in the string as columns. So we just concatenate the columns of each string together (separated by a pipe) and then concatenate these strings over the rows with the Concat operator ordering the concatenated string by the sequence value (over rows). Note that there is no concept of just unique item in a list. A list can have any number of duplicate items. When the list is transformed to a table the sequence number guarantees that same value items will be in the table. In other words the sequence is a key of the table. select ToTable('A|A|B|B|C'.Split({'|'})); value sequence ----- -------- A 0 A 1 B 2 B 3 C 4 And with column names of your choice: select ToTable('A|A|B|B|C'.Split({'|'}),'Str','Seq'); Str Seq --- --- A 0 A 1 B 2 B 3 C 4 (Browse this site for other articles on string operations for more info). Here is some sample data. The data is stored in a Sql Server 2005 database. All queries are written with the D4 relational language of Dataphor. create table JR { id:Integer, People:String tags{Storage.Length='100'}, Position:String tags{Storage.Length='100'}, key{id} }; insert table { row{1 id,'joe|sam|pete|mike' People,'Manager|Team Lead|Worker Ant|Worker Ant' Position}, row{2,'A| C| T ',' JobA|JobC |JobT '}, row{3,'A| C ',' JobA | JobB |JobC '}, //Ummatched strings (lists) row{4,'Bush | Cheney | Rice ' ,' Worser|Worst|Worse '}, row{5,'Z |Y |X ' ,' Backwards Z|Backwards Y |Backwards X '} } into JR; select JR; id People Position -- --------------------- --------------------------------------- 1 joe|sam|pete|mike Manager|Team Lead|Worker Ant|Worker Ant 2 A| C| T JobA|JobC |JobT 3 A| C JobA | JobB |JobC 4 Bush | Cheney | Rice Worser|Worst|Worse 5 Z |Y |X Backwards Z|Backwards Y |Backwards X Also note that we can access the table with a pass-thru query using t-sql: select SQLQuery("SELECT id,People,Position FROM JR"); id People Position -- --------------------- --------------------------------------- 1 joe|sam|pete|mike Manager|Team Lead|Worker Ant|Worker Ant 2 A| C| T JobA|JobC |JobCT 3 A| C JobA | JobB |JobC 4 Bush | Cheney | Rice Worser|Worst|Worse 5 Z |Y |X Backwards Z|Backwards Y |Backwards X Note that the only restriction used here is that both strings have the same number of items. select JR //The table JR in Sql Server 2005. //SQLQuery("SELECT id,People,Position FROM JR") Or we could use this t-sql query . where //The same 'where' statement as in sql. //Check that each string has the same number of items. People.Split({'|'}).Count()=Position.Split({'|'}).Count() with {IgnoreUnsupported = 'true'} //For each row in the table we 'add' a column (People_Position). //The People_Position column is the '|' delimited string of //People and Position item by item. add { //From a table whose rows represents each item in People and Position, //for each row form a string of the person and position separated by //a '|' and concatenate that string over the rows into a single string. Concat({People_Position,Del} from ( //Use Split to create a 'list' of items from the delimited string. //Then create a table from list with ToTable. Column 'Str1' //is the string value of the item from the list. Column 'Seq' //is the item number from the list. So now a table represents //the original input column People. In other words, the rows of //the table are the delimited items in the string from left to right. (ToTable(People.Split({'|'}),'Str1','Seq'){Str1.Trim() Str1,Seq} ) join //A natural join on Seq which is an integer from 0-># of items //in the string. Seq preserves the sequence from left to //to right of the items in the string. (ToTable(Position.Split({'|'}),'Str2','Seq'){Str2.Trim() Str2,Seq} ) {Seq,Str1,Str2,'|' Del,(Str1+'|'+Str2) People_Position} ) order by {Seq}) //The concatenation over rows is ordered by the //Seq column. This is the same order as the items //in the original input columns, People/Position. //Therefore the original order is preserved in //the result (People_Position). People_Position //This is name of the new column added to each row of JR. } {id,People_Position} //These are the only two columns we've chosen //to display from the query. order by {id}; //Order by the values if the id column. id People_Position -- --------------------------------------------------------- 1 joe|Manager|sam|Team Lead|pete|Worker Ant|mike|Worker Ant 2 A|JobA|C|JobC|T|JobT 4 Bush|Worser|Cheney|Worst|Rice|Worse 5 Z|Backwards Z|Y|Backwards Y|X|Backwards X Instead of eliminating rows where the item counts don't agree we can insert the string 'Unmatched' for those id values. select //Instead of table JR we access the table in sql server with a t-sql query. //In D4 the query does not return an sql 'resultset' but instead is treated //as a table 'variable' which is of a nature fundamentally different than sql. SQLQuery("SELECT id,People,Position FROM JR") //Store the restriction on count as a boolean value (T/F) in column TestCnt. add{People.Split({'|'}).Count()=Position.Split({'|'}).Count() TestCnt} with {IgnoreUnsupported = 'true'} add { //If the TestCnt is true concatenate the rows, else return a nil (null) value. //A case statement just like the sql case statement could also be used. if TestCnt then Concat({People_Position,Del} from ( (ToTable(People.Split({'|'}),'Str1','Seq'){Str1.Trim() Str1,Seq} ) join (ToTable(Position.Split({'|'}),'Str2','Seq'){Str2.Trim() Str2,Seq} ) {Seq,Str1,Str2,'|' Del,(Str1+'|'+Str2) People_Position} ) order by {Seq}) else nil as String People_Position } //Insert the unmatched string for a nil value of the People_Position column. {id,IfNil(People_Position,'*** Unmatched Strings ***') Field} order by {id}; id Field -- --------------------------------------------------------- 1 joe|Manager|sam|Team Lead|pete|Worker Ant|mike|Worker Ant 2 A|JobA|C|JobC|T|JobT 3 *** Unmatched Strings *** 4 Bush|Worser|Cheney|Worst|Rice|Worse 5 Z|Backwards Z|Y|Backwards Y|X|Backwards X D4 allows all the expressive power you need to easily test for any criteria. Here we add some additional rows to table JR that should be eliminated. insert table { //Digits in Postion. row{6 id,'joe|sam|pete|mike' People,'Manager|Team Lead|Worker Ant1|Worker Ant2' Position}, //Digits in People and Position row{7 id,'joe|sam|pete1|mike' People,'Manager|Team Lead|Worker Ant1|Worker Ant' Position} } into JR; select JR; id People Position -- --------------------- ----------------------------------------- 1 joe|sam|pete|mike Manager|Team Lead|Worker Ant|Worker Ant 2 A| C| T JobA|JobC |JobT 3 A| C JobA | JobB |JobC 4 Bush | Cheney | Rice Worser|Worst|Worse 5 Z |Y |X Backwards Z|Backwards Y |Backwards X 6 joe|sam|pete|mike Manager|Team Lead|Worker Ant1|Worker Ant2 7 joe|sam|pete1|mike Manager|Team Lead|Worker Ant1|Worker Ant We add some additional restrictions to the where statement to eliminate rows with strings that contain any non letters. select JR where ( ( People.Split({'|'}).Count()=Position.Split({'|'}).Count() ) //We also test that each item is only letters (no digits or other junk). //This can be done any numbers of ways. Here we just compare the count //of all items to the count of items eliminated by characters other than //letters. and ( People.Split({'|'}).Count() = Count(ToTable(People.Split({'|'}),'F1Str','F1Seq') //Test that items are made up of letters only. To eliminate //blanks contaiminating the test we remove blank character before //testing. where IsLetter(Replace(Trim(F1Str),' ',''))) ) and ( Position.Split({'|'}).Count() = Count(ToTable(Position.Split({'|'}),'F2Str','F2Seq') where IsLetter(Replace(Trim(F2Str),' ',''))) ) ) with {IgnoreUnsupported = 'true'} add { Concat({People_Position,Del} from ( (ToTable(People.Split({'|'}),'Str1','Seq'){Str1.Trim() Str1,Seq} ) join (ToTable(Position.Split({'|'}),'Str2','Seq'){Str2.Trim() Str2,Seq} ) {Seq,Str1,Str2,'|' Del,(Str1+'|'+Str2) People_Position} ) order by {Seq}) People_Position } {id,People_Position} order by {id}; id People_Position -- --------------------------------------------------------- 1 joe|Manager|sam|Team Lead|pete|Worker Ant|mike|Worker Ant 2 A|JobA|C|JobC|T|JobT 4 Bush|Worser|Cheney|Worst|Rice|Worse 5 Z|Backwards Z|Y|Backwards Y|X|Backwards X Thanks for stopping by ☺ .
Tuesday, October 09, 2007
Dataphor - Exploding hierarchical data
This article show examples of using the D4 explode operator. This operator is used for expressing hierarchical data. It is not a 'recursive' operator like the recursive sql CTE but more like the Oracle Connect By construct. The examples follow the ones used by Itzik Ben-Gan to illustrate the recursive CTE query in Sql Server 2005, specifically the 'Single-Parent Environment: Employees Organizational Chart' which can be found at: http://msdn2.microsoft.com/en-us/library/ms345144.aspx#docum_topic4 The explode examples are based on the Employees table used in the sql example. The table is stored in an Sql Server 2005 database. Several of the examples use the concept of a dense rank. As with most of my articles, the code is not necessarily 'clever' but straightforward, not necessarily the best 'performant' but intended to express the many different concepts and constructs in D4. select Employees empid mgrid empname salary ----- ----- -------- ---------- 1 0 Nancy $10,000.00 2 1 Andrew $5,000.00 3 1 Janet $5,000.00 4 1 Margaret $5,000.00 5 2 Steven $2,500.00 6 2 Michael $2,500.00 7 3 Robert $2,500.00 8 3 Laura $2,500.00 9 3 Ann $2,500.00 10 4 Ina $2,500.00 11 7 David $2,000.00 12 7 Ron $2,000.00 13 7 Dan $2,000.00 14 11 James $1,500.00 Get a tree for a specific manager. select ( Employees explode by mgrid = parent empid where mgrid=0 order by {empid} include level with {IgnoreUnsupported = 'true'} ) // Do a little string insert to format the tree. add {'' Temp, level-1 Totalspace} add { ( empname.Insert(0,Temp.PadLeft(Totalspace,'|')) ).Replace('|',' | ') Tree } {Tree,mgrid,empid,empname,sequence,level}; Tree mgrid empid empname sequence level ----------------- ----- ----- -------- -------- ----- Nancy 0 1 Nancy 1 1 | Andrew 1 2 Andrew 2 2 | | Steven 2 5 Steven 3 3 | | Michael 2 6 Michael 4 3 | Janet 1 3 Janet 5 2 | | Robert 3 7 Robert 6 3 | | | David 7 11 David 7 4 | | | | James 11 14 James 8 5 | | | Ron 7 12 Ron 9 4 | | | Dan 7 13 Dan 10 4 | | Laura 3 8 Laura 11 3 | | Ann 3 9 Ann 12 3 | Margaret 1 4 Margaret 13 2 | | Ina 4 10 Ina 14 3 Treat each employee as if they are a manager. This will return the subordinates of each employee regardless of whether or not they are a manager. select ( Employees explode by mgrid = parent empid where mgrid>=0 //Changing where to include all employees. order by {empid} include level with {IgnoreUnsupported = 'true'} ) add {'' Temp, level-1 Totalspace} add { ( empname.Insert(0,Temp.PadLeft(Totalspace,'|')) ).Replace('|',' | ') Tree } {Tree,mgrid,empid,empname,sequence,level} ; Tree mgrid empid empname sequence level ----------------- ----- ----- -------- -------- ----- Nancy 0 1 Nancy 1 1 | Andrew 1 2 Andrew 2 2 | | Steven 2 5 Steven 3 3 | | Michael 2 6 Michael 4 3 | Janet 1 3 Janet 5 2 | | Robert 3 7 Robert 6 3 | | | David 7 11 David 7 4 | | | | James 11 14 James 8 5 | | | Ron 7 12 Ron 9 4 | | | Dan 7 13 Dan 10 4 | | Laura 3 8 Laura 11 3 | | Ann 3 9 Ann 12 3 | Margaret 1 4 Margaret 13 2 | | Ina 4 10 Ina 14 3 Andrew 1 2 Andrew 15 1 | Steven 2 5 Steven 16 2 | Michael 2 6 Michael 17 2 Janet 1 3 Janet 18 1 | Robert 3 7 Robert 19 2 | | David 7 11 David 20 3 | | | James 11 14 James 21 4 | | Ron 7 12 Ron 22 3 | | Dan 7 13 Dan 23 3 | Laura 3 8 Laura 24 2 | Ann 3 9 Ann 25 2 Margaret 1 4 Margaret 26 1 | Ina 4 10 Ina 27 2 Steven 2 5 Steven 28 1 Michael 2 6 Michael 29 1 Robert 3 7 Robert 30 1 | David 7 11 David 31 2 | | James 11 14 James 32 3 | Ron 7 12 Ron 33 2 | Dan 7 13 Dan 34 2 Laura 3 8 Laura 35 1 Ann 3 9 Ann 36 1 Ina 4 10 Ina 37 1 David 7 11 David 38 1 | James 11 14 James 39 2 Ron 7 12 Ron 40 1 Dan 7 13 Dan 41 1 James 11 14 James 42 1 If we reverse 'by mgrid = parent empid' to 'by empid = parent mgrid' we get the tree of a particular employee to their top level manager. select ( Employees explode by empid = parent mgrid where empid=13 order by {empid} include level with {IgnoreUnsupported = 'true'} ) add {'' Temp, level-1 Totalspace} add { ( empname.Insert(0,Temp.PadLeft(Totalspace,'|')) ).Replace('|',' | ') Tree } {Tree,mgrid,empid,empname,sequence,level}; Tree mgrid empid empname sequence level -------------- ----- ----- ------- -------- ----- Dan 7 13 Dan 1 1 | Robert 3 7 Robert 2 2 | | Janet 1 3 Janet 3 3 | | | Nancy 0 1 Nancy 4 4 By changing the where predicate to >=1 we can get a report on all employees (note where empid>=3 eliminates the graph of the 1st 2 employees (Nancy and Andrew) but doesn't eliminate them from graphs of other employees who report to them). Here we create an operator that will give a graph in either direction of a specific employee and their top level manager. In other words, we can either start with the employee and go down to their top level manager or start with the employees top level manager and go down to the employee. The concept of the dense rank is used for binding all rows of each employee together. By getting the empname and empid for each dense rank we can target any employee by name or number. (Note we could, of course, use a view or any number of other constructs. I just felt in the mood to use an operator ☺ ). The operator takes two arguments. The first, aTreeTable, is a table of type Employees. The second, Start_At, is a string and indicates the direction of the graph. Using 'M' for manager starts with the manager. Using 'E' starts with the employee. create operator EmpTree(aTreeTable:typeof(Employees),Start_At:String): //The operator returns a virtual table with columns and their data types //defined by the typeof expression. typeof( Employees add{1 level,'S' empnamerank,1 empidrank, 'T' Tree_Graph, 1 Rank, 1 TreeOrder} ) begin result:=table of typeof(result){}; //Starting at top level mgr for each employee is a desc sort (default). //Starting at each employee to the top level mgr. is an asc sort. //Start_At=M(gr) is desc sort (default) //Start_At='E(mp) is asc sort. var LSort:='D'; //Default. if ( ((Start_At.Trim())[0]).Upper() ) = 'E' then LSort:='A' ; var T:= aTreeTable explode by empid = parent mgrid where empid>=1 order by {empid} include level with {IgnoreUnsupported = 'true'}; //Get a dense rank. This rank binds all rows for each employee //together. The idea is to increment a count for every level 1 since //a level 1 indicates the start of a new employee. var TR:= T add{case when level=1 then empname else nil end NameEmp} add { Count( T rename {sequence sequenceX} where (sequenceX<=sequence) and (level=1) ) Rank }; var SR:= TR join ( TR group by {Rank} //We want the emp name, number (empid) and reverse level for each //dense rank (empname). We want the reverse level so we can get //the tree representative from the employee to top level manager //AND the top level manager to the employee. add{Max(empid) NumEmpid,Max(NameEmp) Emp_Name,Max(level) Maxlevel} ) add { (empname.Insert(0,''.PadLeft( (if LSort='A' then (level-1) else (Maxlevel-level)), '|'))).Replace('|',' | ') Tree_Graph } rename {sequence seq}; result:= ToTable( ToList( cursor(SR order by { Rank, seq sort ((1 - (2*ToInteger((LSort = 'D'))))*(.left.value ?= .right.value)) asc } )//cursor )//ToList ) //ToTable {empid,mgrid,empname,salary,level,Emp_Name empnamerank, NumEmpid empidrank,Tree_Graph,Rank,sequence+1 TreeOrder}; end; Here we show the first five employees by using Rank in a where statement. The value of the Rank corresponds to the ascending order of empid. Because empid starts at 1 Rank happens to be equal to the empid. select EmpTree(Employees,'Mgr') //We use the operator as if it were a table. where Rank<=5 order by {TreeOrder}; empid mgrid empname salary level empnamerank empidrank Tree_Graph Rank TreeOrder ----- ----- -------- ---------- ----- ----------- --------- ------------ ---- --------- 1 0 Nancy $10,000.00 1 Nancy 1 Nancy 1 1 1 0 Nancy $10,000.00 2 Andrew 2 Nancy 2 2 2 1 Andrew $5,000.00 1 Andrew 2 | Andrew 2 3 1 0 Nancy $10,000.00 2 Janet 3 Nancy 3 4 3 1 Janet $5,000.00 1 Janet 3 | Janet 3 5 1 0 Nancy $10,000.00 2 Margaret 4 Nancy 4 6 4 1 Margaret $5,000.00 1 Margaret 4 | Margaret 4 7 1 0 Nancy $10,000.00 3 Steven 5 Nancy 5 8 2 1 Andrew $5,000.00 2 Steven 5 | Andrew 5 9 5 2 Steven $2,500.00 1 Steven 5 | | Steven 5 10 We can overload the EmpTree operator so as to provide a default value for the Start_At parameter, ie. the direction of the graph. We make the default 'M' so the graph starts with the top level manager of the employee. We simply supply the literal 'Mgr' for the sort direction for the same operator whose signiture includes the Start_At parameter. create operator EmpTree(aTreeTable:typeof(Employees)): typeof( Employees add{1 level,'S' empnamerank,1 empidrank, 'T' Tree_Graph, 1 Rank, 1 TreeOrder} ) begin result:= EmpTree(aTreeTable,'Mgr'); end; Here get the tree of employee 'James' starting at the highest level manager by using the overload signature of the EmpTree operator. select EmpTree(Employees) where empnamerank='James' {Tree_Graph,TreeOrder} order by {TreeOrder}; Tree_Graph TreeOrder ----------------- --------- Nancy 38 | Janet 39 | | Robert 40 | | | David 41 | | | | James 42 Here we start with employee 'James' up to his highest level manager. select EmpTree(Employees,' Emp ') where empnamerank='James' {Tree_Graph,TreeOrder} order by {TreeOrder}; Tree_Graph TreeOrder ----------------- --------- James 38 | David 39 | | Robert 40 | | | Janet 41 | | | | Nancy 42 Here we use the EmpTree operator to get the enumerated paths in both directions using the Concat (concatenation) operator. (More info here). select (EmpTree(Employees) add{'.' Del} adorn{key{empidrank,TreeOrder}}) group by {empidrank} add { Max(empnamerank) empname, Concat(empname,Del order by {empidrank,TreeOrder}) PathMgrtoEmp, Concat(empname,Del order by {empidrank,TreeOrder desc}) PathEmptoMgr } rename {empidrank empid} order by {empid}; empid empname PathMgrtoEmp PathEmptoMgr ----- -------- ------------------------------ ------------------------------ 1 Nancy Nancy Nancy 2 Andrew Nancy.Andrew Andrew.Nancy 3 Janet Nancy.Janet Janet.Nancy 4 Margaret Nancy.Margaret Margaret.Nancy 5 Steven Nancy.Andrew.Steven Steven.Andrew.Nancy 6 Michael Nancy.Andrew.Michael Michael.Andrew.Nancy 7 Robert Nancy.Janet.Robert Robert.Janet.Nancy 8 Laura Nancy.Janet.Laura Laura.Janet.Nancy 9 Ann Nancy.Janet.Ann Ann.Janet.Nancy 10 Ina Nancy.Margaret.Ina Ina.Margaret.Nancy 11 David Nancy.Janet.Robert.David David.Robert.Janet.Nancy 12 Ron Nancy.Janet.Robert.Ron Ron.Robert.Janet.Nancy 13 Dan Nancy.Janet.Robert.Dan Dan.Robert.Janet.Nancy 14 James Nancy.Janet.Robert.David.James James.David.Robert.Janet.Nancy Here we get the counts of employees directly or indirectly reporting to managers. select Employees explode by empid = parent mgrid where empid>=1 order by {empid} include level with {IgnoreUnsupported = 'true'} group by {mgrid} add{Count() MgrCnt} where mgrid>0 mgrid MgrCnt ----- ------ 1 13 2 2 3 7 4 1 7 4 11 1 Here are the employees who are the managers. For example there are 13 people who report to Nancy. This is represented indirectly by people who report to the three managers directly below her (Andrew, Janet, Margaret) and directly by the same managers reporting to her. Dan, David and Ron report to Robert. But there is an additional employee reporting to David (James) so Robert (mgrid 7) has 4 employees reporting to him. select ( Employees explode by empid = parent mgrid where empid>=1 order by {empid} include level with {IgnoreUnsupported = 'true'} group by {mgrid} add{Count() MgrCnt} ) join Employees where mgrid>0 join ( Employees {empid MgrEmpid,empname MgrName} ) by mgrid=MgrEmpid {mgrid,MgrName,empname,MgrCnt}; mgrid MgrName empname MgrCnt ----- -------- -------- ------ 1 Nancy Andrew 13 1 Nancy Janet 13 1 Nancy Margaret 13 2 Andrew Michael 2 2 Andrew Steven 2 3 Janet Ann 7 3 Janet Laura 7 3 Janet Robert 7 4 Margaret Ina 1 7 Robert Dan 4 7 Robert David 4 7 Robert Ron 4 11 David James 1 Here we get salaries of subordinates under managers. Those employees who are not managers are omitted. var T:= Employees explode by mgrid = parent empid where mgrid>=0 order by {empid} include level with {IgnoreUnsupported = 'true'}; //Get a dense rank. This rank binds all rows for each employee //together. The idea is to increment a count for every level 1 since //a level 1 indicates the start of a new employee. var TR:= T add{case when level=1 then empname else nil end NameEmp} add { Count( T rename {sequence sequenceX} where (sequenceX<=sequence) and (level=1) ) Rank }; var SR:= ( TR join ( TR group by {Rank} //We want the empid and name for each dense rank. add{Min(empid) NumEmpid,Max(NameEmp) Emp_Name} ) ) //We only want managers, those that have subordinates. where NumEmpid in ( Employees {mgrid} ) with {IgnoreUnsupported = 'true'} add { ( (empname+ case when level>1 then ' ('+ToString(salary)+')' else '' end).Insert(0,''.PadLeft((level-1),'|'))).Replace('|',' | ') Mgr_Sal_Tree }; select SR {Mgr_Sal_Tree,sequence} order by {sequence}; Mgr_Sal_Tree sequence ----------------------------- -------- Nancy 1 | Andrew ($5,000.00) 2 | | Steven ($2,500.00) 3 | | Michael ($2,500.00) 4 | Janet ($5,000.00) 5 | | Robert ($2,500.00) 6 | | | David ($2,000.00) 7 | | | | James ($1,500.00) 8 | | | Ron ($2,000.00) 9 | | | Dan ($2,000.00) 10 | | Laura ($2,500.00) 11 | | Ann ($2,500.00) 12 | Margaret ($5,000.00) 13 | | Ina ($2,500.00) 14 Andrew 15 | Steven ($2,500.00) 16 | Michael ($2,500.00) 17 Janet 18 | Robert ($2,500.00) 19 | | David ($2,000.00) 20 | | | James ($1,500.00) 21 | | Ron ($2,000.00) 22 | | Dan ($2,000.00) 23 | Laura ($2,500.00) 24 | Ann ($2,500.00) 25 Margaret 26 | Ina ($2,500.00) 27 Robert 30 | David ($2,000.00) 31 | | James ($1,500.00) 32 | Ron ($2,000.00) 33 | Dan ($2,000.00) 34 David 38 | James ($1,500.00) 39 Here is the same tree as above (of salaries for subordinates) using a table and a view created with a pass-thru query. We can create a table based on the result of explode. create table Emp_T from ( Employees explode by mgrid = parent empid where mgrid>=0 order by {empid} include level with {IgnoreUnsupported = 'true'} add{case when level=1 then empname else nil end NameEmp} adorn //We can include various meta-data about the columns of the table. { NameEmp nil static tags {Storage.Length = "10"}, empname static tags {Storage.Length = "10"} }//end adorn. ); Now we get the dense rank using a pass-thru query to Sql Server. The result of the sql query could be set to var TR which means that TR is a table variable in D4. The TR variable is therefore of the same nature no matter how it (a table variable) was derived. Or we could create a view based on the pass-thru query and use that. var TR:= but we're using a view instead of setting the pass-thru to var TR. create view TR SQLQuery("select A.*, (select Count(*) from Emp_T as B where (B.sequence<=A.sequence) and (B.level=1)) as Rank from Emp_T as A"); Use the TR view in a batch to get the salary tree. var SR:= ( TR join ( TR group by {Rank} //We want the empid and name for each dense rank. add{Min(empid) NumEmpid,Max(NameEmp) Emp_Name} ) ) //We only want managers, those that have subordinates. where NumEmpid in ( Employees {mgrid} ) with {IgnoreUnsupported = 'true'} add { ( (empname+ case when level>1 then ' ('+ToString(salary)+')' else '' end).Insert(0,''.PadLeft((level-1),'|'))).Replace('|',' | ') Mgr_Sal_Tree }; select SR {Mgr_Sal_Tree,sequence} order by {sequence}; Mgr_Sal_Tree sequence ------------------------ -------- Nancy 1 | Andrew (5000) 2 | | Steven (2500) 3 | | Michael (2500) 4 | Janet (5000) 5 | | Robert (2500) 6 | | | David (2000) 7 | | | | James (1500) 8 | | | Ron (2000) 9 | | | Dan (2000) 10 | | Laura (2500) 11 | | Ann (2500) 12 | Margaret (5000) 13 | | Ina (2500) 14 Andrew 15 | Steven (2500) 16 | Michael (2500) 17 Janet 18 | Robert (2500) 19 | | David (2000) 20 | | | James (1500) 21 | | Ron (2000) 22 | | Dan (2000) 23 | Laura (2500) 24 | Ann (2500) 25 Margaret 26 | Ina (2500) 27 Robert 30 | David (2000) 31 | | James (1500) 32 | Ron (2000) 33 | Dan (2000) 34 David 38 | James (1500) 39 Here we get the sum of salaries for subordinates under managers using the TR view. Eliminating level 1 in the query excludes the managers salary in the sum. Again the dense rank idea makes this an easy query. select ( TR join //This is a natural join based on Rank. ( TR group by {Rank} //We want the empid and name for each dense rank. add{Min(empid) NumEmpid,Max(NameEmp) Emp_Name} )//->The relation of view TR being group by Rank. )//->The relation from TR joined to (TR grouped by Rank) where level>1 //A 'where' applied to above relation. This relation //is now grouped by NumEmpid to get subordinate salaries. group by{NumEmpid} add{Max(Emp_Name) Emp_Name,Sum(salary) SumSalary} order by {NumEmpid}; NumEmpid Emp_Name SumSalary -------- -------- ---------- 1 Nancy 37500.0000 2 Andrew 5000.0000 3 Janet 15000.0000 4 Margaret 2500.0000 7 Robert 7500.0000 11 David 1500.0000
Monday, October 01, 2007
Sql - The Two Standards of the Sql CTE
Sql really has two standards. There is the ANSI sql standard and then there is the DOUBLE standard. This example uses Sql Server 2005 and specifically the CTE (common table expression). Sql Server sees nothing wrong with this query that duplicates a column name. Something that no database should allow. SELECT employeeid,customerid as employeeid FROM orders From Bol on the CTE column_name 'Duplicate names within a single CTE definition are not allowed.The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.' But this query produces an error. WITH Emps (Emp, Emp) AS ( SELECT employeeid,customerid as employeeid FROM orders ) SELECT * FROM Emps Error: 'The column 'Emp' was specified multiple times for 'Emps'. But try this. And it works! WITH Emps (Emp, Emp1) AS ( SELECT employeeid,customerid as employeeid FROM orders ) SELECT * FROM Emps So what does Bol really mean in the column_name description. It means that the stupid and error prone idea of creating duplicate column names is perfectly ok in the query definition. It's only the outer reference names (the ( column_name [ ,...n ] ) that are going to be checked for duplicates. So the MS statement: 'The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.' is a recognition that its okay to screw up the query_definition as as long as you come away with distinct names in the CTE. In other words, give distinct names for the 'same' columns created in the query_definition with the same name. If you don't duplicate names in the quey_definition you can dispense with the CTE column name list totally. So not only is the MS statement a statement of guilt (don't expect us to correct duplicate column names when it first occurs, we'll catch it the 2nd time it comes around), it's also a statement of a double standard. One standard for an independent SELECT statement and another standard for a CTE! Double standards allow the initial error to propagate. Not what you want in a database. This is what a database should do when duplicate columns are declared. This is what Dataphor will do when you try this. select Orders {EmployeeID,CustomerID EmployeeID}; Error: Duplicate object name "EmployeeID". Only if you 'love' sql could you tolerate its nonsense ☺ . For more on the sql CTE see: 'Do you know what the Sql CTE is?' http://beyondsql.blogspot.com/2007/10/sql-whats-really-with-with.html
Sql - Do you know what the Sql CTE is?
From Sql Server 2005 Bol for WITH CTE (common_table_expression): 'Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, or DELETE statement.' Now this works fine (using the NorthWind database in Sql Server 2005): WITH Emps (Emp, Cnt) AS ( SELECT employeeid,count(*) AS Cnt FROM orders GROUP BY employeeid ) SELECT Max(Cnt) FROM Emps But try to define a CTE independently. You can't. So this CTE craps out: WITH Emps (Emp, Cnt) AS ( SELECT employeeid,count(*) AS Cnt FROM orders GROUP BY employeeid ) Try to use the CTE in a batch with a 2nd SELECT. You can't 'reuse' the CTE. So this batch craps out with an error thrown on the use of the CTE on the 2nd SELECT. WITH Emps (Emp, Cnt) AS ( SELECT employeeid,count(*) AS Cnt FROM orders GROUP BY employeeid ) SELECT Max(Cnt) FROM Emps SELECT Min(Cnt) FROM Emps -- Invalid object name 'Emps'. Now what if a CTE in a batch could be defined independently and be used with multiple SELECT statements. What would the batch look like. Well it could look something like this using the D4 language of Dataphor (see http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html for some more background): //Define a CTE independently. And to define a CTE independently it must //be stored in a variable. We are using a pass-thru query from D4 to Sql Server //in t-sql to the NorthWind database. var EMPS.CTE:= SQLQuery("SELECT employeeid,count(*) AS Cnt FROM orders GROUP BY employeeid"); //Define a row using the CTE. var LRow:=row{Min(Cnt from EMPS.CTE) MinCnt,Max(Cnt from EMPS.CTE) MaxCnt};//A row. //SELECT the row. select LRow; /* MinCnt MaxCnt ------ ------ 42 156 */ //Use the CTE in a 2nd SELECT, SELECT a table select EMPS.CTE return 2 by {Cnt desc}; /* employeeid Cnt ---------- --- 4 156 3 127 */ What is really going on here. Now Sql has managed to immunize itself against computer science forever. But if we were to apply a compute science term to the sql CTE what term what we call it. We would call it a value. Because a value cannot exist independently (by itself :)) and cannot be reused. To reuse a value we have to declare every time we want to use it. With a variable we don't have to worry about any of these value limitations. We can simply reuse the variable as much as we want. The 'var' in 'var EMPS.CTE:=..' means define a variable that contains a value and the value is the CTE SELECT statement. So now you know the sql term 'result set' is really just a value :) And now you know the benefit of working with variables over values. Of course the situation with the sql CTE is not the only case where you have to repeat the damn thing to use it. This works fine in sql. SELECT orderid,customerid,employeeid FROM orders WHERE customerid in ('COMMI','TORTU','HUNGO') But can you store the list of ('COMMI','TORTU','HUNGO') in a variable? No. You have to repeat it every time you want to use it. It's because there is no variable available to assign the list to. Of course in D4 we can assign the list to a variable and use the variable just like the CTE. var LList:=list(String){'COMMI','TORTU','HUNGO'}; select Orders where CustomerID in LList {OrderID,CustomerID,EmployeeID}; Doesn't this make more sense than having to use xml in sql to shred the string or other crazy sql techniques to store the list items in a table?
Subscribe to:
Posts (Atom)