Beyond Sql: October 2007

Monday, October 22, 2007

Sql - Really simple division

This article is based on the thread:

microsoft.public.sqlserver.programming
Monday, October 22, 2007
"select rows that contain both arg1 = 'x' and arg1 = 'y'"
http://tinyurl.com/386lbx

>I need to select only rows where the name contains x and y.  I do not want
>to get rows that only contain x or contain x, z, or contain y, z.  I only
>want rows where the name has an x an a y (they can have a z but they must
>have an x and a y).

Forget QA or SSMS. Just draw a picture.
You use the term row. So draw a row.

row{aName as sName, acode as code}

Now you want people who have code x and y. These are two
rows. So draw them. Use 'burns' for the person.

row{'burns' as sName, 'x' as code}
row{'burns' as sName, 'y' as code}

Now what do you call two rows together? How about calling it
a table ☺ .
Draw it.

table
   {
    row{'burns' as sName, 'x' as code}
    row{'burns' as sName, 'y' as code}
   }
 
Now if this table, made just for 'burns', is in your
#tmp1 table then 'burns' is a guy you want. In other
words, if both rows (one for x, one for y) are in
#tmp1 then you have a hit. Super simple ☺ .

Now in QA or SSMS do this.

declare @x int, @y int
set @x=2
set @y=3
if @x<=@y
print 'Yes'
else
print 'No'

No explanation necessary. Super simple. Now take the same
idea of comparing two integers and extend it to comparing
two tables.

if
table
     {
      row{'burns' as sName, 'x' as code}
      row{'burns' as sName, 'y' as code}
     }
       <=
         #tmp1
           print 'Yes'
                else 'No'
              
In other words, does each row for 'burns' occur in #tmp1?  
'Burns' can have codes in #tmp1 in addition to x and y (ie. 2<=3)
So 'burns' has to have at least a row for x and a row for y
in #tmp1. If in #tmp1 'burns' has only an 'x' or only a 'y'
no matter what other codes he has that's no good (2<=1).
In the case above you will see 'Yes' printed since the comparison
is true.

This whole scenario is referred to as relational division in
database terminology. But these simple ideas are obscured
by sql because you can't draw a picture of a row, nor a
table nor does sql understand comparing tables like integers.
So instead you're left with grouping and counting, joins,
intersects, existential queries and whatnot all trying to
express a simple idea yet at the same time obscuring it.

Now in a query you want to substitute all the unique names
from #tmp1 into our little table so for each person we can
test the comparison with #tmp1. What would such a query look
like?

select
 select distinct sName as aPerson from #tmp1 
  where      -- draw a table with two rows for each aPerson
             -- the 1st column has a value aPerson and
             -- the column is named 'sName'. The 2nd column is
             -- called 'code'. The column names and datatypes
             -- are the same as in #tmp1.
        table
             {
              row{aPerson as sName, 'x' as code},
              row{aPerson as sName, 'y' as code}
             }
             --  Compare the tables.
              <=
             --  Form a table from #tmp1 of rows belonging to the
             --  aPerson above.
              (Tmp1 where sName=aPerson);

Now this won't quite work in sql no matter where you execute it ☺ .
But what would a query really look like that will work with
#tmp1. Here it is. And it really is almost self-explanatory.
And it works in the D4 language! :)
(Tmp1 is a table same as #tmp1 stored in an Sql Server 2005 database).

select
  Tmp1 {sName aPerson}
    where
         table
              {
               row{aPerson sName, 'x' code},
               row{aPerson sName, 'y' code}
              }
               <=
              (Tmp1 where sName=aPerson);
   
aPerson
-------
burns 
jones 
smith 

Now this is what MS should be doing. Sql is a language of choice
for some things. But it certainly is not the choice language
for others.

Sunday, October 21, 2007

Sql Server - The Undefined Trigger

Ask the question: does Sql Server have a row level trigger.
Answer: it depends. From the point of view of application
development how in the world can the answer possibly be
ambiguous? Well lets back up and look at the Update statement.
From Sql Server 2005 bol:
Using UPDATE with the FROM Clause
'The results of an UPDATE statement are undefined if the
 statement includes a FROM clause that is not specified in
 such a way that only one value is available for each column
 occurrence that is updated, that is if the UPDATE statement
 is not deterministic'.
Ok what is really the rational for even keeping this
'proprietary' syntax? Since it is inherently non-deterministic
why even offer it to developers most of whom do not understand
but the simplest of sql. After all Sql Server is explicitly
relinquishing responsibility for the integrity of the Update and
making the user responsible. Shouldn't the idea of a database be
the other way around? Ok now lets move to the trigger. From bol
under the topic: Multirow Considerations for DML Triggers:
'When you write the code for a DML trigger, consider that the
 statement that causes the trigger to fire can be a single statement
 that affects multiple rows of data, instead of a single row.'
Does this sound familiar? Of course it does. The same non-determinism
of the FROM clause in Update now reappears in the insert table
of the trigger. But of course it is not presented as a 'trigger
is inherently undefined'. No, now the undefined nature of the
trigger is called 'a Multirow consideration'. And again the server
reliquishes responsibility of integrity and puts it in the
hands of the user. Well lets get real. There is no such thing as
a multirow trigger. From an applications standpoint the largest
scope of a trigger is a row. The so called multirow trigger is an 
example of what happens when users don't scrutinize what nonsense
is thrown their way. Instead of getting rid of a stupid idea
in Update using a FROM it was instead extended to a trigger.
It was simply an easy and expedient thing to do. But it was
shameful and lazy too. Lack of integrity is based on lack of
character. But it is equally disappointing that more users do
not complain and demand the integrity that a database should offer.
Sql Server can do much, much better.  But, like E.T., users must
phone home.

Wednesday, October 17, 2007

Sql - Constantly struggling

The subject of this article is called (by others) database constants.
So lets be sure we're on the same page with what a constant is.
It is just a value.

X = Substring(Y,1,5);

X and Y are variables, 1 and 5 are values.

X = Y * 5;

X and Y are variables, 5 is a constant. It is a number whose value is 5.
A table in D4 is just as much a variable as X and Y. A table in 
sql is like the value 5, a constant. The difference between a constant
and a variable is also the difference between D4, a relational
database, and sql!

The following is part of an exchange I had with someone on the
issue of database constants. The idea of database constants
was raised in the article:

A New (and Hopefully Better) Approach to Constants
http://www.sqlservercentral.com/articles/T-SQL/61244/

Here I go ☺ :

Sql Server lets users define a 'table variable'. Now you really
don't believe it's a table 'variable' do you?:)  Of course not.
They're just playing with words. There is no such thing in sql
server as a table variable. But because they call it that most
users really believe it. It's like believing in the tooth fairy:)
Now we're going to play the same game with database 'constants':)
There is no such thing as a database constant. There are only
values (which can't be updated or assigned other values) and
variables (which can be updated and assigned other values).
But in sql there is no real talk of these basic things. And
that's why I talk about Dataphor ☺. There is no difference between
what this guy is calling a 'constant' and a table or 'resultset' in
sql. They are all (constant) values and not variables. I'm trying
to show what sql would be like if there really were table 'variables'.
One way was with the CTE article, 'Do you know what the Sql CTE really is?'
But people seem to be having a hard time wrapping their head
around it:) This sql stuff just gets crazier. Now people are
going to believe that you can pass a table as a parameter to a
stored procedure. That is nonsense. One reason people believe it
is they have nothing to compare/constrast it with. So I show what
it really means to pass a table to a sp:
http://beyondsql.blogspot.com/2007/10/sql-whats-really-with-with.html
http://beyondsql.blogspot.com/2007/06/dataphor-13-passing-table-as-parameter.html
http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html
http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html
etc.etc.

But to do these kind of things requires a foundation different than
sql. Hence the name of my site ☺ .
The so called 'constants' are basically a table without a key, ie a row.
So I do:
create table Y
{
 A:Integer,
 B:String,
 key{}
};

insert row{1 A,'Micheal' B} into Y;

select Y[].B
'Michael'
select Y[].A
1
where the '[]' is called a row extracter.
Or I can define a 'list':
var LList:=list(String){'Michael','Steve','Jimmy'};
select LList[0];
Michael

Or I can define a table (ie a row) with lists.
create table Z
{
 Strings:list(String),  //The column holds a list of strings.
 Integers:list(Integer),//The column holds a list of integers.
 key{}
};

insert row{{'Micheal','Steve','Jimmy'} Strings,{2,10,22,40} Integers}
into Z;

select Z[].Strings[2];
Jimmy
select Z[].Integers[1];
10
where the [2] and [1] refer to the ordinal position of the item (value)
in the list.
But sql doesn't have a table 'type' or a list type so it couldn't
understand these things. The only thing sql can do is try to simulate
these types. And it winds up in a mess of gooblygook that few understand
and robs the user of the integrity that the database should provide
by supporting these things directly. All this stuff is based on values,
variables and types. The only types that exist is sql are numbers and
strings. Again there is no type for tables, row or list. And therefore
no variables for these things because without a type you can't define
a variable. Make sense ☺ .
var X:= Orders //X is a variable of type table, specifically the type
               //of the Orders table (the column names and there data types).
This is the huge step forward from sql. In sql the result of a query
is no different than the 'Constant' that guy was trying to talk about ☺ .
It is also the difference between a CTE and a variable that holds the
result of the 'CTE'. You can't reuse a CTE in sql because it's a 'constant',
a value. You can only reuse a variable.

Dataphor - Merging strings together

This problem was raised in the thread:
microsoft.public.sqlserver.programming
Monday, October 15, 2007 
'Parse and Merge two fields'
http://tinyurl.com/2eosa3

The idea is to take two pipe ('|') delimited strings and created one
string where each item in the first string is matched with the item 
in the second string having the same ordinal position.
The two columns are People and Position. So given:

People:=' Sam | Jane | Gene'
Position:= 'Accounting | Finance | Marketing'

we want:

People_Position:= 'Sam|Accounting|Jane|Finance|Gene|Accounting'

Of course in sql this is usually a mess since sql does not
support a list type and any of the operations that can transform
a delimited string to a list and visa versa.
In D4 we simply split the delimited string to a list of items and
then create a table from the list. The table will contain the
ordinal position as an integer value of each item in the list. 
So a row has the item (string) value and sequence value of that item
in the string as columns. So we just concatenate the columns of each
string together (separated by a pipe) and then concatenate these 
strings over the rows with the Concat operator ordering the 
concatenated string by the sequence value (over rows). 

Note that there is no concept of just unique item in a list. A list
can have any number of duplicate items. When the list is transformed
to a table the sequence number guarantees that same value items will
be in the table. In other words the sequence is a key of the table.

select ToTable('A|A|B|B|C'.Split({'|'}));

value sequence 
----- -------- 
A     0        
A     1        
B     2        
B     3        
C     4      

And with column names of your choice:

select ToTable('A|A|B|B|C'.Split({'|'}),'Str','Seq');

Str Seq 
--- --- 
A   0   
A   1   
B   2   
B   3   
C   4   

(Browse this site for other articles on string operations for more info).

Here is some sample data. The data is stored in a Sql Server 2005 database.
All queries are written with the D4 relational language of Dataphor.

create table JR
{
 id:Integer,
 People:String tags{Storage.Length='100'},
 Position:String tags{Storage.Length='100'},
 key{id}
};
 
insert
table
{
row{1 id,'joe|sam|pete|mike' People,'Manager|Team Lead|Worker Ant|Worker Ant' Position},
row{2,'A| C|  T ',' JobA|JobC |JobT '},
row{3,'A| C ',' JobA | JobB |JobC '}, //Ummatched strings (lists)
row{4,'Bush | Cheney | Rice ' ,' Worser|Worst|Worse '},
row{5,'Z |Y |X ' ,' Backwards Z|Backwards Y |Backwards X '}
} into JR;

select JR;

id People                Position                                
-- --------------------- --------------------------------------- 
1  joe|sam|pete|mike     Manager|Team Lead|Worker Ant|Worker Ant 
2  A| C|  T               JobA|JobC |JobT                        
3  A| C                   JobA | JobB |JobC                      
4  Bush | Cheney | Rice   Worser|Worst|Worse                     
5  Z |Y |X                Backwards Z|Backwards Y |Backwards X

Also note that we can access the table with a pass-thru query using t-sql:

select SQLQuery("SELECT id,People,Position FROM JR"); 

id People                Position                                
-- --------------------- --------------------------------------- 
1  joe|sam|pete|mike     Manager|Team Lead|Worker Ant|Worker Ant 
2  A| C|  T               JobA|JobC |JobCT                       
3  A| C                   JobA | JobB |JobC                      
4  Bush | Cheney | Rice   Worser|Worst|Worse                     
5  Z |Y |X                Backwards Z|Backwards Y |Backwards X   

Note that the only restriction used here is that both strings have
the same number of items. 

select 
  JR //The table JR in Sql Server 2005. 
  //SQLQuery("SELECT id,People,Position FROM JR") Or we could use this t-sql query . 
     where //The same 'where' statement as in sql. 
           //Check that each string has the same number of items.      
          People.Split({'|'}).Count()=Position.Split({'|'}).Count()
                                with {IgnoreUnsupported = 'true'}
      //For each row in the table we 'add' a column (People_Position).
      //The People_Position column is the '|' delimited string of
      //People and Position item by item.
      add
      {
       //From a table whose rows represents each item in People and Position,
       //for each row form a string of the person and position separated by
       //a '|' and concatenate that string over the rows into a single string.
       Concat({People_Position,Del} 
        from
         (
          //Use Split to create a 'list' of items from the delimited string.
          //Then create a table from list with ToTable. Column 'Str1'
          //is the string value of the item from the list. Column 'Seq'
          //is the item number from the list. So now a table represents
          //the original input column People. In other words, the rows of
          //the table are the delimited items in the string from left to right.
          (ToTable(People.Split({'|'}),'Str1','Seq'){Str1.Trim() Str1,Seq} )       
           join  //A natural join on Seq which is an integer from 0-># of items
                 //in the string. Seq preserves the sequence from left to
                 //to right of the items in the string.
          (ToTable(Position.Split({'|'}),'Str2','Seq'){Str2.Trim() Str2,Seq} )
            {Seq,Str1,Str2,'|' Del,(Str1+'|'+Str2) People_Position}  
         )
                order by {Seq}) //The concatenation over rows is ordered by the
                                //Seq column. This is the same order as the items
                                //in the original input columns, People/Position.
                                //Therefore the original order is preserved in
                                //the result (People_Position).
        People_Position //This is name of the new column added to each row of JR.
       }
        {id,People_Position}  //These are the only two columns we've chosen
                              //to display from the query.
            order by {id};    //Order by the values if the id column.
 
id People_Position                                           
-- --------------------------------------------------------- 
1  joe|Manager|sam|Team Lead|pete|Worker Ant|mike|Worker Ant 
2  A|JobA|C|JobC|T|JobT                                      
4  Bush|Worser|Cheney|Worst|Rice|Worse                       
5  Z|Backwards Z|Y|Backwards Y|X|Backwards X                


Instead of eliminating rows where the item counts don't agree
we can insert the string 'Unmatched' for those id values.

select 
 //Instead of table JR we access the table in sql server with a t-sql query.
 //In D4 the query does not return an sql 'resultset' but instead is treated
 //as a table 'variable' which is of a nature fundamentally different than sql.
 SQLQuery("SELECT id,People,Position FROM JR") 
 //Store the restriction on count as a boolean value (T/F) in column TestCnt.
  add{People.Split({'|'}).Count()=Position.Split({'|'}).Count() TestCnt}
                                with {IgnoreUnsupported = 'true'}
   add
      {
 //If the TestCnt is true concatenate the rows, else return a nil (null) value.
 //A case statement just like the sql case statement could also be used.
  if TestCnt
      then
       Concat({People_Position,Del}
        from
         (
          (ToTable(People.Split({'|'}),'Str1','Seq'){Str1.Trim() Str1,Seq} )       
           join
          (ToTable(Position.Split({'|'}),'Str2','Seq'){Str2.Trim() Str2,Seq} )
            {Seq,Str1,Str2,'|' Del,(Str1+'|'+Str2) People_Position}  
         )
                order by {Seq})
                else
                  nil
                    as String
        People_Position
      }
 //Insert the unmatched string for a nil value of the People_Position column.     
        {id,IfNil(People_Position,'*** Unmatched Strings ***') Field} 
          order by {id};
          
id Field                                                     
-- --------------------------------------------------------- 
1  joe|Manager|sam|Team Lead|pete|Worker Ant|mike|Worker Ant 
2  A|JobA|C|JobC|T|JobT                                      
3  *** Unmatched Strings ***                                 
4  Bush|Worser|Cheney|Worst|Rice|Worse                       
5  Z|Backwards Z|Y|Backwards Y|X|Backwards X                 


D4 allows all the expressive power you need to easily test for
any criteria. Here we add some additional rows to table JR that
should be eliminated.

insert
table
{
//Digits in Postion.
row{6 id,'joe|sam|pete|mike' People,'Manager|Team Lead|Worker Ant1|Worker Ant2' Position},
//Digits in People and Position
row{7 id,'joe|sam|pete1|mike' People,'Manager|Team Lead|Worker Ant1|Worker Ant' Position}
} into JR;

select JR;

id People                Position                                  
-- --------------------- ----------------------------------------- 
1  joe|sam|pete|mike     Manager|Team Lead|Worker Ant|Worker Ant   
2  A| C|  T               JobA|JobC |JobT                          
3  A| C                   JobA | JobB |JobC                        
4  Bush | Cheney | Rice   Worser|Worst|Worse                       
5  Z |Y |X                Backwards Z|Backwards Y |Backwards X     
6  joe|sam|pete|mike     Manager|Team Lead|Worker Ant1|Worker Ant2 
7  joe|sam|pete1|mike    Manager|Team Lead|Worker Ant1|Worker Ant

We add some additional restrictions to the where statement to eliminate
rows with strings that contain any non letters.

select 
 JR
  where 
    (
     ( People.Split({'|'}).Count()=Position.Split({'|'}).Count() )
     //We also test that each item is only letters (no digits or other junk).
     //This can be done any numbers of ways. Here we just compare the count
     //of all items to the count of items eliminated by characters other than 
     //letters.
       and
       (
        People.Split({'|'}).Count()
        =
        Count(ToTable(People.Split({'|'}),'F1Str','F1Seq') 
        //Test that items are made up of letters only. To eliminate
        //blanks contaiminating the test we remove blank character before
        //testing.
          where IsLetter(Replace(Trim(F1Str),' ','')))
       )
       and
       (
        Position.Split({'|'}).Count()
        =
        Count(ToTable(Position.Split({'|'}),'F2Str','F2Seq') 
          where IsLetter(Replace(Trim(F2Str),' ','')))
       )   
     )  with {IgnoreUnsupported = 'true'}        
      add
      {
       Concat({People_Position,Del}
        from
         (
          (ToTable(People.Split({'|'}),'Str1','Seq'){Str1.Trim() Str1,Seq} )       
           join
          (ToTable(Position.Split({'|'}),'Str2','Seq'){Str2.Trim() Str2,Seq} )
            {Seq,Str1,Str2,'|' Del,(Str1+'|'+Str2) People_Position}  
         )
                order by {Seq})
        People_Position
       }
        {id,People_Position}
          order by {id};

id People_Position                                           
-- --------------------------------------------------------- 
1  joe|Manager|sam|Team Lead|pete|Worker Ant|mike|Worker Ant 
2  A|JobA|C|JobC|T|JobT                                      
4  Bush|Worser|Cheney|Worst|Rice|Worse                       
5  Z|Backwards Z|Y|Backwards Y|X|Backwards X                

Thanks for stopping by ☺ .

Tuesday, October 09, 2007

Dataphor - Exploding hierarchical data

This article show examples of using the D4 explode operator.
This operator is used for expressing hierarchical data. It is
not a 'recursive' operator like the recursive sql CTE but more
like the Oracle Connect By construct.

The examples follow the ones used by Itzik Ben-Gan to illustrate
the recursive CTE query in Sql Server 2005, specifically the
'Single-Parent Environment: Employees Organizational Chart'
which can be found at:
http://msdn2.microsoft.com/en-us/library/ms345144.aspx#docum_topic4

The explode examples are based on the Employees table used in the
sql example. The table is stored in an Sql Server 2005 database.
Several of the examples use the concept of a dense rank.

As with most of my articles, the code is not necessarily 'clever' but
straightforward, not necessarily the best 'performant' but intended
to express the many different concepts and constructs in D4.

select Employees

empid mgrid empname  salary    
----- ----- -------- ----------
1     0     Nancy    $10,000.00
2     1     Andrew   $5,000.00 
3     1     Janet    $5,000.00 
4     1     Margaret $5,000.00 
5     2     Steven   $2,500.00 
6     2     Michael  $2,500.00 
7     3     Robert   $2,500.00 
8     3     Laura    $2,500.00 
9     3     Ann      $2,500.00 
10    4     Ina      $2,500.00 
11    7     David    $2,000.00 
12    7     Ron      $2,000.00 
13    7     Dan      $2,000.00 
14    11    James    $1,500.00 


Get a tree for a specific manager.

select
 (
  Employees
    explode
   by mgrid = parent empid
   where  mgrid=0
     order by {empid}
       include level
   with {IgnoreUnsupported = 'true'}
  )
// Do a little string insert to format the tree.  
    add {'' Temp, level-1 Totalspace}
     add {
          (
           empname.Insert(0,Temp.PadLeft(Totalspace,'|'))
          ).Replace('|',' | ')
          Tree
         }
        {Tree,mgrid,empid,empname,sequence,level};
       
Tree              mgrid empid empname  sequence level
----------------- ----- ----- -------- -------- -----
Nancy             0     1     Nancy    1        1    
 | Andrew         1     2     Andrew   2        2    
 |  | Steven      2     5     Steven   3        3    
 |  | Michael     2     6     Michael  4        3    
 | Janet          1     3     Janet    5        2    
 |  | Robert      3     7     Robert   6        3    
 |  |  | David    7     11    David    7        4    
 |  |  |  | James 11    14    James    8        5    
 |  |  | Ron      7     12    Ron      9        4    
 |  |  | Dan      7     13    Dan      10       4    
 |  | Laura       3     8     Laura    11       3    
 |  | Ann         3     9     Ann      12       3    
 | Margaret       1     4     Margaret 13       2    
 |  | Ina         4     10    Ina      14       3    

Treat each employee as if they are a manager. This will return the
subordinates of each employee regardless of whether or not
they are a manager.

select
 (
  Employees
    explode
   by mgrid = parent empid
   where  mgrid>=0         //Changing where to include all employees.
     order by {empid}
       include level
   with {IgnoreUnsupported = 'true'}
  )
     add {'' Temp, level-1 Totalspace}
     add {
          (
           empname.Insert(0,Temp.PadLeft(Totalspace,'|'))
          ).Replace('|',' | ')
          Tree
         }
        {Tree,mgrid,empid,empname,sequence,level} ;
       
Tree              mgrid empid empname  sequence level
----------------- ----- ----- -------- -------- -----
Nancy             0     1     Nancy    1        1    
 | Andrew         1     2     Andrew   2        2    
 |  | Steven      2     5     Steven   3        3    
 |  | Michael     2     6     Michael  4        3    
 | Janet          1     3     Janet    5        2    
 |  | Robert      3     7     Robert   6        3    
 |  |  | David    7     11    David    7        4    
 |  |  |  | James 11    14    James    8        5    
 |  |  | Ron      7     12    Ron      9        4    
 |  |  | Dan      7     13    Dan      10       4    
 |  | Laura       3     8     Laura    11       3    
 |  | Ann         3     9     Ann      12       3    
 | Margaret       1     4     Margaret 13       2    
 |  | Ina         4     10    Ina      14       3    
Andrew            1     2     Andrew   15       1    
 | Steven         2     5     Steven   16       2    
 | Michael        2     6     Michael  17       2    
Janet             1     3     Janet    18       1    
 | Robert         3     7     Robert   19       2    
 |  | David       7     11    David    20       3    
 |  |  | James    11    14    James    21       4    
 |  | Ron         7     12    Ron      22       3    
 |  | Dan         7     13    Dan      23       3    
 | Laura          3     8     Laura    24       2    
 | Ann            3     9     Ann      25       2    
Margaret          1     4     Margaret 26       1    
 | Ina            4     10    Ina      27       2    
Steven            2     5     Steven   28       1    
Michael           2     6     Michael  29       1    
Robert            3     7     Robert   30       1    
 | David          7     11    David    31       2    
 |  | James       11    14    James    32       3    
 | Ron            7     12    Ron      33       2    
 | Dan            7     13    Dan      34       2    
Laura             3     8     Laura    35       1    
Ann               3     9     Ann      36       1    
Ina               4     10    Ina      37       1    
David             7     11    David    38       1    
 | James          11    14    James    39       2    
Ron               7     12    Ron      40       1    
Dan               7     13    Dan      41       1    
James             11    14    James    42       1            

If we reverse 'by mgrid = parent empid' to 'by empid = parent mgrid' we
get the tree of a particular employee to their top level manager.

select
 (
  Employees
    explode
      by empid = parent mgrid
         where  empid=13
   order by {empid}
       include level
   with {IgnoreUnsupported = 'true'}
  )
     add {'' Temp, level-1 Totalspace}
     add {
          (
           empname.Insert(0,Temp.PadLeft(Totalspace,'|'))
          ).Replace('|',' | ')
          Tree
         }
        {Tree,mgrid,empid,empname,sequence,level};
       
Tree           mgrid empid empname sequence level
-------------- ----- ----- ------- -------- -----
Dan            7     13    Dan     1        1    
 | Robert      3     7     Robert  2        2    
 |  | Janet    1     3     Janet   3        3    
 |  |  | Nancy 0     1     Nancy   4        4    
       
By changing the where predicate to >=1 we can get a report on all
employees (note where empid>=3 eliminates the graph of the 1st 2
employees (Nancy and Andrew) but doesn't eliminate them from graphs
of other employees who report to them).


Here we create an operator that will give a graph in either direction
of a specific employee and their top level manager. In other words,
we can either start with the employee and go down to their top level
manager or start with the employees top level manager and go down
to the employee. The concept of the dense rank is used for binding
all rows of each employee together. By getting the empname and empid
for each dense rank we can target any employee by name or number.   
(Note we could, of course, use a view or any number of other constructs.
I just felt in the mood to use an operator ☺ ).

The operator takes two arguments. The first, aTreeTable, is a table
of type Employees. The second, Start_At, is a string and indicates
the direction of the graph. Using 'M' for manager starts with the
manager. Using 'E' starts with the employee.

create operator EmpTree(aTreeTable:typeof(Employees),Start_At:String):
//The operator returns a virtual table with columns and their data types
//defined by the typeof expression.
typeof(
       Employees
       add{1 level,'S' empnamerank,1 empidrank, 'T' Tree_Graph, 1 Rank, 1 TreeOrder}
       )
begin
result:=table of typeof(result){};
//Starting at top level mgr for each employee is a desc sort (default).
//Starting at each employee to the top level mgr. is an asc sort.    
//Start_At=M(gr) is desc sort (default)
//Start_At='E(mp) is asc sort.
var LSort:='D'; //Default.
if ( ((Start_At.Trim())[0]).Upper() ) = 'E'
   then LSort:='A' ;
var T:=
aTreeTable
explode  
by empid = parent mgrid
    where  empid>=1
      order by {empid}
       include level
   with {IgnoreUnsupported = 'true'};
//Get a dense rank. This rank binds all rows for each employee
//together. The idea is to increment a count for every level 1 since
//a level 1 indicates the start of a new employee.
var TR:=
       T add{case when level=1 then empname else nil end NameEmp}
        add
           {
            Count(
                  T rename {sequence sequenceX}
                   where  (sequenceX<=sequence) and (level=1)
                  ) 
             Rank
            };
var SR:=
     TR               
      join
       (
        TR group by {Rank}
        //We want the emp name, number (empid) and reverse level for each
        //dense rank (empname). We want the reverse level so we can get
        //the tree representative from the employee to top level manager
        //AND the top level manager to the employee.
                add{Max(empid) NumEmpid,Max(NameEmp) Emp_Name,Max(level) Maxlevel}
        )                        
 add {
    (empname.Insert(0,''.PadLeft( (if LSort='A' then (level-1) else (Maxlevel-level)),
                                                         '|'))).Replace('|',' | ')
        Tree_Graph
     }
     rename {sequence seq};
result:=
 ToTable(
  ToList(
 cursor(SR
 order by
     {
      Rank,
      seq
      sort ((1 - (2*ToInteger((LSort = 'D'))))*(.left.value ?= .right.value)) asc
     }
    
       )//cursor
       )//ToList
      )  //ToTable
        {empid,mgrid,empname,salary,level,Emp_Name empnamerank,
          NumEmpid empidrank,Tree_Graph,Rank,sequence+1 TreeOrder};
end;         


Here we show the first five employees by using Rank in a where statement.
The value of the Rank corresponds to the ascending order of empid.
Because empid starts at 1 Rank happens to be equal to the empid.

select EmpTree(Employees,'Mgr') //We use the operator as if it were a table.
  where Rank<=5
      order by {TreeOrder};     

empid mgrid empname  salary     level empnamerank empidrank Tree_Graph   Rank TreeOrder
----- ----- -------- ---------- ----- ----------- --------- ------------ ---- ---------
1     0     Nancy    $10,000.00 1     Nancy       1         Nancy        1    1        
1     0     Nancy    $10,000.00 2     Andrew      2         Nancy        2    2        
2     1     Andrew   $5,000.00  1     Andrew      2          | Andrew    2    3        
1     0     Nancy    $10,000.00 2     Janet       3         Nancy        3    4        
3     1     Janet    $5,000.00  1     Janet       3          | Janet     3    5        
1     0     Nancy    $10,000.00 2     Margaret    4         Nancy        4    6        
4     1     Margaret $5,000.00  1     Margaret    4          | Margaret  4    7        
1     0     Nancy    $10,000.00 3     Steven      5         Nancy        5    8        
2     1     Andrew   $5,000.00  2     Steven      5          | Andrew    5    9        
5     2     Steven   $2,500.00  1     Steven      5          |  | Steven 5    10     


We can overload the EmpTree operator so as to provide a default value
for the Start_At parameter, ie. the direction of the graph. We make
the default 'M' so the graph starts with the top level manager of the
employee.

We simply supply the literal 'Mgr' for the sort direction for the
same operator whose signiture includes the Start_At parameter.

create operator EmpTree(aTreeTable:typeof(Employees)):
typeof(
       Employees
       add{1 level,'S' empnamerank,1 empidrank, 'T' Tree_Graph, 1 Rank, 1 TreeOrder}
       )
begin
result:= EmpTree(aTreeTable,'Mgr');
end;

     
Here get the tree of employee 'James' starting at the highest level manager
by using the overload signature of the EmpTree operator.

select EmpTree(Employees)
  where empnamerank='James'
  {Tree_Graph,TreeOrder}
      order by {TreeOrder};
     
Tree_Graph        TreeOrder
----------------- ---------
Nancy             38       
 | Janet          39       
 |  | Robert      40       
 |  |  | David    41       
 |  |  |  | James 42       


Here we start with employee 'James' up to his highest level manager.

select EmpTree(Employees,' Emp ')
  where empnamerank='James'
  {Tree_Graph,TreeOrder}
      order by {TreeOrder};
     
Tree_Graph        TreeOrder
----------------- ---------
James             38       
 | David          39       
 |  | Robert      40       
 |  |  | Janet    41       
 |  |  |  | Nancy 42        


Here we use the EmpTree operator to get the enumerated paths in both
directions using the Concat (concatenation) operator. (More info here).

select
 (EmpTree(Employees) add{'.' Del} adorn{key{empidrank,TreeOrder}})
   group by {empidrank}
    add 
          {
            Max(empnamerank) empname,
            Concat(empname,Del order by {empidrank,TreeOrder}) PathMgrtoEmp,
            Concat(empname,Del order by {empidrank,TreeOrder desc}) PathEmptoMgr
          }
     rename {empidrank empid}
      order by {empid};
     
empid empname  PathMgrtoEmp                   PathEmptoMgr                  
----- -------- ------------------------------ ------------------------------
1     Nancy    Nancy                          Nancy                         
2     Andrew   Nancy.Andrew                   Andrew.Nancy                  
3     Janet    Nancy.Janet                    Janet.Nancy                   
4     Margaret Nancy.Margaret                 Margaret.Nancy                
5     Steven   Nancy.Andrew.Steven            Steven.Andrew.Nancy           
6     Michael  Nancy.Andrew.Michael           Michael.Andrew.Nancy          
7     Robert   Nancy.Janet.Robert             Robert.Janet.Nancy            
8     Laura    Nancy.Janet.Laura              Laura.Janet.Nancy             
9     Ann      Nancy.Janet.Ann                Ann.Janet.Nancy               
10    Ina      Nancy.Margaret.Ina             Ina.Margaret.Nancy            
11    David    Nancy.Janet.Robert.David       David.Robert.Janet.Nancy      
12    Ron      Nancy.Janet.Robert.Ron         Ron.Robert.Janet.Nancy        
13    Dan      Nancy.Janet.Robert.Dan         Dan.Robert.Janet.Nancy        
14    James    Nancy.Janet.Robert.David.James James.David.Robert.Janet.Nancy
    

Here we get the counts of employees directly or indirectly reporting to managers.

select
  Employees
    explode
      by empid = parent mgrid
         where  empid>=1
   order by {empid}
       include level
   with {IgnoreUnsupported = 'true'}
     group by {mgrid} add{Count() MgrCnt}
      where mgrid>0

 mgrid MgrCnt
 ----- ------
 1     13    
 2     2     
 3     7     
 4     1     
 7     4     
11     1    


Here are the employees who are the managers. For example there are 13 people
who report to Nancy. This is represented indirectly by people who report
to the three managers directly below her (Andrew, Janet, Margaret) and
directly by the same managers reporting to her. Dan, David and Ron report
to Robert. But there is an additional employee reporting to David (James)
so Robert (mgrid 7) has 4 employees reporting to him.

select
 (
  Employees
    explode
      by empid = parent mgrid
         where  empid>=1
   order by {empid}
       include level
   with {IgnoreUnsupported = 'true'}
     group by {mgrid} add{Count() MgrCnt}
  )
   join Employees
     where mgrid>0
      join  ( Employees {empid MgrEmpid,empname MgrName} )
        by mgrid=MgrEmpid
      {mgrid,MgrName,empname,MgrCnt};
     
mgrid MgrName  empname  MgrCnt
----- -------- -------- ------
1     Nancy    Andrew   13    
1     Nancy    Janet    13    
1     Nancy    Margaret 13    
2     Andrew   Michael  2     
2     Andrew   Steven   2     
3     Janet    Ann      7     
3     Janet    Laura    7     
3     Janet    Robert   7     
4     Margaret Ina      1     
7     Robert   Dan      4     
7     Robert   David    4     
7     Robert   Ron      4     
11    David    James    1           


Here we get salaries of subordinates under managers. Those employees
who are not managers are omitted.

var T:=
Employees
explode  
by mgrid = parent empid
     where  mgrid>=0
      order by {empid}
       include level
   with {IgnoreUnsupported = 'true'};
//Get a dense rank. This rank binds all rows for each employee
//together. The idea is to increment a count for every level 1 since
//a level 1 indicates the start of a new employee.
var TR:=
       T add{case when level=1 then empname else nil end NameEmp}
        add
           {
            Count(
                  T rename {sequence sequenceX}
                   where  (sequenceX<=sequence) and (level=1)
                  ) 
             Rank
            };
var SR:=
   (
     TR               
      join
       (
        TR group by {Rank}
        //We want the empid and name for each dense rank.
          add{Min(empid) NumEmpid,Max(NameEmp) Emp_Name}
        )                        
    )
    //We only want managers, those that have subordinates.
     where
     NumEmpid
              in
                ( Employees {mgrid} ) with {IgnoreUnsupported = 'true'}
      add {
         ( (empname+ case when level>1 then ' ('+ToString(salary)+')'
             else '' end).Insert(0,''.PadLeft((level-1),'|'))).Replace('|',' | ')
              Mgr_Sal_Tree
          };

select SR {Mgr_Sal_Tree,sequence} order by {sequence};   

Mgr_Sal_Tree                  sequence
----------------------------- --------
Nancy                         1       
 | Andrew ($5,000.00)         2       
 |  | Steven ($2,500.00)      3       
 |  | Michael ($2,500.00)     4       
 | Janet ($5,000.00)          5       
 |  | Robert ($2,500.00)      6       
 |  |  | David ($2,000.00)    7       
 |  |  |  | James ($1,500.00) 8       
 |  |  | Ron ($2,000.00)      9       
 |  |  | Dan ($2,000.00)      10      
 |  | Laura ($2,500.00)       11      
 |  | Ann ($2,500.00)         12      
 | Margaret ($5,000.00)       13      
 |  | Ina ($2,500.00)         14      
Andrew                        15      
 | Steven ($2,500.00)         16      
 | Michael ($2,500.00)        17      
Janet                         18      
 | Robert ($2,500.00)         19      
 |  | David ($2,000.00)       20      
 |  |  | James ($1,500.00)    21      
 |  | Ron ($2,000.00)         22      
 |  | Dan ($2,000.00)         23      
 | Laura ($2,500.00)          24      
 | Ann ($2,500.00)            25      
Margaret                      26      
 | Ina ($2,500.00)            27      
Robert                        30      
 | David ($2,000.00)          31      
 |  | James ($1,500.00)       32      
 | Ron ($2,000.00)            33      
 | Dan ($2,000.00)            34      
David                         38      
 | James ($1,500.00)          39    

Here is the same tree as above (of salaries for subordinates) using a
table and a view created with a pass-thru query.

We can create a table based on the result of explode.

create table Emp_T
from
     (
       Employees
       explode  
       by mgrid = parent empid
       where  mgrid>=0
       order by {empid}
       include level
       with {IgnoreUnsupported = 'true'}
       add{case when level=1 then empname else nil end NameEmp}
        adorn  //We can include various meta-data about the columns of the table.
             {
              NameEmp nil static tags {Storage.Length = "10"},
              empname static tags {Storage.Length = "10"}
             }//end adorn.
     );  

Now we get the dense rank using a pass-thru query to Sql Server. The result
of the sql query could be set to var TR which means that TR is a table
variable in D4. The TR variable is therefore of the same nature
no matter how it (a table variable) was derived. Or we could create
a view based on the pass-thru query and use that.
var TR:= but we're using a view instead of setting the pass-thru to var TR.

create view TR
  SQLQuery("select A.*, 
      (select Count(*)
              from Emp_T as B
                 where (B.sequence<=A.sequence) and (B.level=1)) as Rank
          from Emp_T as A");

Use the TR view in a batch to get the salary tree.

var SR:=
   (
     TR               
      join
       (
        TR group by {Rank}
        //We want the empid and name for each dense rank.
                add{Min(empid) NumEmpid,Max(NameEmp) Emp_Name}
        )                        
    )
    //We only want managers, those that have subordinates.
     where
     NumEmpid
              in
                ( Employees {mgrid} ) with {IgnoreUnsupported = 'true'}
      add {
         ( (empname+ case when level>1 then ' ('+ToString(salary)+')'
             else '' end).Insert(0,''.PadLeft((level-1),'|'))).Replace('|',' | ')
              Mgr_Sal_Tree
          };
select SR {Mgr_Sal_Tree,sequence} order by {sequence};

Mgr_Sal_Tree             sequence
------------------------ --------
Nancy                    1       
 | Andrew (5000)         2       
 |  | Steven (2500)      3       
 |  | Michael (2500)     4       
 | Janet (5000)          5       
 |  | Robert (2500)      6       
 |  |  | David (2000)    7       
 |  |  |  | James (1500) 8       
 |  |  | Ron (2000)      9       
 |  |  | Dan (2000)      10      
 |  | Laura (2500)       11      
 |  | Ann (2500)         12      
 | Margaret (5000)       13      
 |  | Ina (2500)         14      
Andrew                   15      
 | Steven (2500)         16      
 | Michael (2500)        17      
Janet                    18      
 | Robert (2500)         19      
 |  | David (2000)       20      
 |  |  | James (1500)    21      
 |  | Ron (2000)         22      
 |  | Dan (2000)         23      
 | Laura (2500)          24      
 | Ann (2500)            25      
Margaret                 26      
 | Ina (2500)            27      
Robert                   30      
 | David (2000)          31      
 |  | James (1500)       32      
 | Ron (2000)            33      
 | Dan (2000)            34      
David                    38      
 | James (1500)          39


Here we get the sum of salaries for subordinates under managers using
the TR view. Eliminating level 1 in the query excludes the managers
salary in the sum. Again the dense rank idea makes this an easy query.

select
   (
    TR               
      join //This is a natural join based on Rank.
       (
        TR group by {Rank}
        //We want the empid and name for each dense rank.
            add{Min(empid) NumEmpid,Max(NameEmp) Emp_Name}
        )//->The relation of view TR being group by Rank.
    )//->The relation from TR joined to (TR grouped by Rank) 
          where level>1 //A 'where' applied to above relation. This relation
                        //is now grouped by NumEmpid to get subordinate salaries.
            group by{NumEmpid} add{Max(Emp_Name) Emp_Name,Sum(salary) SumSalary}
              order by {NumEmpid};

NumEmpid Emp_Name SumSalary 
-------- -------- ----------
1        Nancy    37500.0000
2        Andrew   5000.0000 
3        Janet    15000.0000
4        Margaret 2500.0000 
7        Robert   7500.0000 
11       David    1500.0000

Monday, October 01, 2007

Sql - The Two Standards of the Sql CTE

Sql really has two standards. There is the ANSI sql standard and then
there is the DOUBLE standard.

This example uses Sql Server 2005 and specifically the CTE (common table expression).

Sql Server sees nothing wrong with this query that duplicates a column name.
Something that no database should allow.

SELECT employeeid,customerid as employeeid
FROM orders

From Bol on the CTE column_name
'Duplicate names within a single CTE definition are not allowed.The list of
 column names is optional only if distinct names for all resulting  columns
 are supplied in the query definition.'

But this query produces an error.

WITH Emps  (Emp, Emp) AS
(
    SELECT employeeid,customerid as employeeid
    FROM orders
)
SELECT *
FROM Emps
Error: 'The column 'Emp' was specified multiple times for 'Emps'.

But try this. And it works!

WITH Emps  (Emp, Emp1) AS
(
    SELECT employeeid,customerid as employeeid
    FROM orders
)
SELECT *
FROM Emps

So what does Bol really mean in the column_name description. It means
that the stupid and error prone idea of creating duplicate column names
is perfectly ok in the query definition. It's only the outer reference
names (the  ( column_name [ ,...n ] ) that are going to be checked for
duplicates. So the MS statement:
'The list of column names is optional only if distinct names for all
 resulting columns are supplied in the query definition.'

is a recognition that its okay to screw up the query_definition as
as long as you come away with distinct names in the CTE. In other words,
give distinct names for the 'same' columns created in the query_definition
with the same name. If you don't duplicate names in the quey_definition
you can dispense with the CTE column name list totally. So not only is the
MS statement a statement of guilt (don't expect us to correct duplicate column
names when it first occurs, we'll catch it the 2nd time it comes around),
it's also a statement of a double standard. One standard for an independent
SELECT statement and another standard for a CTE! Double standards allow
the initial error to propagate. Not what you want in a database.

This is what a database should do when duplicate columns are declared.
This is what Dataphor will do when you try this.

select Orders {EmployeeID,CustomerID EmployeeID};
Error: Duplicate object name "EmployeeID".

Only if you 'love' sql could you tolerate its nonsense ☺ .

For more on the sql CTE see:
'Do you know what the Sql CTE is?'
http://beyondsql.blogspot.com/2007/10/sql-whats-really-with-with.html

Sql - Do you know what the Sql CTE is?

From Sql Server 2005 Bol for WITH CTE (common_table_expression):
'Specifies a temporary named result set, known as a common table expression (CTE).
 This is derived from a simple query and defined within the execution scope of
 a single SELECT, INSERT, UPDATE,
 or DELETE statement.'

Now this works fine (using the NorthWind database in Sql Server 2005):

WITH Emps (Emp, Cnt) AS
(
    SELECT employeeid,count(*) AS Cnt
    FROM orders
    GROUP BY employeeid
)
SELECT Max(Cnt)
FROM Emps

But try to define a CTE independently. You can't.
So this CTE craps out:

WITH Emps (Emp, Cnt) AS
(
    SELECT employeeid,count(*) AS Cnt
    FROM orders
    GROUP BY employeeid
)

Try to use the CTE in a batch with a 2nd SELECT. You can't 'reuse' the CTE.
So this batch craps out with an error thrown on the use of the CTE
on the 2nd SELECT.

WITH Emps (Emp, Cnt) AS
(
    SELECT employeeid,count(*) AS Cnt
    FROM orders
    GROUP BY employeeid
)
SELECT Max(Cnt)
FROM Emps

SELECT Min(Cnt)
FROM Emps       -- Invalid object name 'Emps'.

Now what if a CTE in a batch could be defined independently and
be used with multiple SELECT statements. What would the batch look
like. Well it could look something like this using the D4 language of Dataphor
(see http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html
 for some more background):

//Define a CTE independently. And to define a CTE independently it must
//be stored in a variable. We are using a pass-thru query from D4 to Sql Server
//in t-sql to the NorthWind database.
var EMPS.CTE:=
              SQLQuery("SELECT employeeid,count(*) AS Cnt
                          FROM orders
                            GROUP BY employeeid");
//Define a row using the CTE.
var LRow:=row{Min(Cnt from EMPS.CTE) MinCnt,Max(Cnt from EMPS.CTE) MaxCnt};//A row.  
//SELECT the row.
select LRow; 
/*
MinCnt MaxCnt
------ ------
42     156
*/
//Use the CTE in a 2nd SELECT, SELECT a table
select EMPS.CTE return 2 by {Cnt desc};
/*
employeeid Cnt
---------- ---
4          156
3          127
*/

What is really going on here. Now Sql has managed to immunize itself
against computer science forever. But if we were to apply a compute
science term to the sql CTE what term what we call it. We would call
it a value. Because a value cannot exist independently (by itself :))
and cannot be reused. To reuse a value we have to declare every time
we want to use it. With a variable we don't have to worry about
any of these value limitations. We can simply reuse the variable
as much as we want. The 'var' in 'var EMPS.CTE:=..' means define
a variable that contains a value and the value is the CTE SELECT
statement. So now you know the sql term 'result set' is really just
a value :) And now you know the benefit of working with variables
over values.

Of course the situation with the sql CTE is not the only case where
you have to repeat the damn thing to use it.

This works fine in sql.

SELECT orderid,customerid,employeeid
FROM orders
WHERE customerid in ('COMMI','TORTU','HUNGO')

But can you store the list of ('COMMI','TORTU','HUNGO') in a variable? No.
You have to repeat it every time you want to use it. It's because there is
no variable available to assign the list to. Of course in D4 we can
assign the list to a variable and use the variable just like the CTE.

var LList:=list(String){'COMMI','TORTU','HUNGO'};
select Orders
where CustomerID in LList
{OrderID,CustomerID,EmployeeID};

Doesn't this make more sense than having to use xml in sql to shred the
string or other crazy sql techniques to store the list items in a table?

Monday, October 22, 2007

Sql - Really simple division

Sunday, October 21, 2007

Sql Server - The Undefined Trigger

Wednesday, October 17, 2007

Sql - Constantly struggling

Dataphor - Merging strings together

Tuesday, October 09, 2007

Dataphor - Exploding hierarchical data

Monday, October 01, 2007

Sql - The Two Standards of the Sql CTE

Sql - Do you know what the Sql CTE is?

Beyond Sql

Blog Archive

Links

About Me