Dataphor SQL RAC (Relational Application Companion)


A site of hope for those looking for a true relational database system

Saturday, August 11, 2007

Dataphor - Creating a Super Function

The Concept of the Super Function

In the thread:
comp.databases.ms-sqlserver
Jul 25, 6:52 am
'Pass Table as a parameter to a function'
http://tinyurl.com/2j4dvy

Joe Celko makes the following comment on passing a table as a
parameter to a procedure:

>> Is it possible to pass a table as a parameter to a function.  <<

'Please read a book, any book, on data modeling, and RDBMS.  A table is
 an entity or a relationship.  That would mean you have a magical,
 super function that works on Squids, Automobiles, Britney Spears,
 Geographical locations or anything in the whole of creation.'

While his reply is pejorative the name 'super function' is perfectly
valid! It is actually a more catchy name for what is commonly referred
to as 'reusable code'. And such code rests on a fundamental concept:
being able to represent objects as 'variables'. In a database the
objects we are referring to are most notably tables. A table as a
variable allows us to pass it as a parameter to a function (procedure).
A procedure that takes a table as a parameter and returns a virtual
table (result) allows us to realize a procedure as a variable, a super
function. The super function allows an infinite number of representations
of the same logic (in the form of a table) for tables of the same type,
ie. tables of Squids, Automobiles and Britney Spears  This is not possible
in sql since what are talking about as a variable (tables,procedures)
can only be represented as literals in sql. In sql we can only reference
a table by its name since there is no type for a table. An sql
procedure/function dependent on a table can only express logic for
that particular table and hence, like the sql table, is a literal
representation of an object (table) not a variable that can represent
an infinite number of solutions.

An example of a super function in the D4 language of Dataphor.

A common requirement is find a dense rank given that a grouping column
can have repeating values. Procedure DenseRank takes a table as a parameter
with an integer column named PK which represents the tables primary key
and a column named Grp which represents a group column. The operator returns
a table with the primary key column (PK) and an integer column which is
the dense rank (DenseRank). The DenseRank procedures calls another
procedure, IDenseRank, which takes the same DenseRank table as a parameter.
Both procedures are overloaded with the group (Grp) column of the table
defined as a string. The DenseRank super function can therefore represent
the dense rank of any table in the database that has a single integer column
that is unique (not necessarily the primary key) and a column
(either integer or string) that represents a group column. The group column
may very well represent Squids, Automobiles or Britney Spears  It does
not matter as the super function is predicated on column type only. The
logic of the procedure (a dense rank) is independent of what the meaning
of the predicate (rows of the input table) is. This independence is
achieved by the concept of a what a table type is. In other words,
the dense rank super function is only possible by substituting the
significance of the type of a column(s) for the name of a column. In
sql all logical significance rests in the name of a column. The only
significance of the type of a column is computational. The significance
of type, its logical significance, is a major advancement of D4 over
sql and allows objects like super functions. The examples that follow
will make clear what these concepts mean in practice.

(If you are just beginning D4 it is more important to understand the
 the general concepts of database objects being variables and the
 importance of type than specific programming techniques. If you have
 programming questions about the procedure(s) please feel free to post
 them in comments.)

Operator (function) DenseRank returns a table (virtual) with a unique
integer column (PK) and an integer column with the dense rank (DenseRank).

create operator DenseRank(aTable:table{PK:Integer,Grp:Integer}):
                         table{PK:Integer,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
         aTable {PK}
            times
             IDenseRank(aTable)
               where PK between PKMin and PKMax
                 {PK,DenseRank};
end;               

This operator returns a single row for each dense rank. It also has the
starting (min) and ending (max) primary key values of the rank. The ranks
are consecutively numbered in case there is a query that involves a
meaningful ordering of the dense ranks (in the direction of the primary key).

create operator IDenseRank(aTable:table{PK:Integer,Grp:Integer}):
                      table{PKMin:Integer,PKMax:Integer,DenseRank:Integer}
begin
result:=table of typeof (result){}; 
result:=                  
   ToTable(ToList(cursor(
       (
        ToTable(ToList(cursor(
       //Have to get PK consecutively numbered (PKSeq).
          ToTable(ToList(cursor(aTable order by {PK} ))) {PK,Grp,sequence+1 PKSeq}
            order by {Grp,PKSeq}  ))) 
              add{PKSeq-(sequence+1) Drank}
                group by {Grp,Drank} add{Min(PK) PKMin,Max(PK) PKMax}
       )     
                  order by {PKMin}  ))) 
                    {PKMin,PKMax,sequence+1 DenseRank} ; 
end;        

Here we overload the DenseRank and IDenseRank procedures to accept a group
column (Grp) of type string. The overload allows us to essentially ignore
the type of the group column (as long as it's an integer or a string type).
The system will determine which overload of the procedures to use based on
the table (and its group column type) passed.

create operator DenseRank(aTable:table{PK:Integer,Grp:String}):
                         table{PK:Integer,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
         aTable {PK}
            times
             IDenseRank(aTable)
               where PK between PKMin and PKMax
                 {PK,DenseRank};
end;                
 
create operator IDenseRank(aTable:table{PK:Integer,Grp:String}):
                      table{PKMin:Integer,PKMax:Integer,DenseRank:Integer}
begin
result:=table of typeof (result){}; 
result:=                  
   ToTable(ToList(cursor(
       (
        ToTable(ToList(cursor(
         ToTable(ToList(cursor(aTable order by {PK} ))) {PK,Grp,sequence+1 PKSeq}
            order by {Grp,PKSeq}  ))) 
              add{PKSeq-(sequence+1) Drank}
                group by {Grp,Drank} add{Min(PK) PKMin,Max(PK) PKMax}
       )     
                  order by {PKMin}  ))) 
                    {PKMin,PKMax,sequence+1 DenseRank} ; 
end;        

(D4 will store two signatures for each procedure accounting for an integer and
 a string type for Grp. The table returned (PK and DenseRank) is exactly the
 same regardless of Grp type.)

Examples of using the DenseRank super function (the use of the term function
or procedure or operator is a matter of choice, they all mean the same thing
and here they all imply super ☺ )

(All tables use MS Sql Server 2005 as the storage device.)

Consider table Squids:

create table Squids
{
 SquidPK:Integer, //Primary key.
 Squid:Integer,   //Indicates a squid group that can repeat.
 Length:Integer,  //Length of the squid in squidly centimeters.
 Weight:Integer,  //Weight of the squid in squidly globs.
 key{SquidPK}
};
insert
table
{
row{1 SquidPK,1 Squid,5 Length,10 Weight},
row{5,1,4,6},
row{7,2,3,8},
row{9,2,7,3},
row{11,3,5,8},
row{12,3,6,7},
row{15,3,6,8},
row{17,1,3,5},
row{19,1,6,7},
row{21,2,8,5},
row{23,2,6,8},
row{25,2,6,8}
} into Squids;

select Squids;

SquidPK Squid Length Weight
------- ----- ------ ------
1       1     5      10    
5       1     4      6     
7       2     3      8     
9       2     7      3     
11      3     5      8     
12      3     6      7     
15      3     6      8     
17      1     3      5     
19      1     6      7     
21      2     8      5     
23      2     6      8     
25      2     6      8     

To obtain the dense rank we need:
 1. compatibility (type) of columns SquidPK and Squid with the columns of
    the generic table parameter aTable (PK,Grp) which we have.
 2. the elimination of all columns except the primary key and group.
 3. header agreement (column name equivalency) between the table passed
    and the generic names of the columns in the function.
   
We can accomplish 2 and 3 using remove and rename and obtain the ranks with:

select DenseRank(Squids remove{Length,Weight} rename {SquidPK PK,Squid Grp});

We can even combine the remove and replace into one statement using specify { }.

select DenseRank(Squids{SquidPK PK,Squid Grp});

PK DenseRank
-- ---------
1  1        
5  1        
7  2        
9  2        
11 3        
12 3        
15 3        
17 4        
19 4        
21 5        
23 5        
25 5        

What we have is the variable DenseRank representing the dense ranks of
Squids. But the variable is still generic in the sense that PK is not
meaningful for table Squids. So we simply reverse the process used in
calling the function, we rename the PK column to the primary key column
in Squids (SquidPK):

select DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank}; 

SquidPK DenseRank
------- ---------
1       1        
5       1        
7       2        
9       2        
11      3        
12      3        
15      3        
17      4        
19      4        
21      5        
23      5        
25      5        

By joining the Squids table to variable DenseRank we can represent
all the data of Squids along with its dense rank:

select
 (DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank})
//A natural join using the primary key (SquidPK) from tables DenseRank and Squids.
  join
   Squids
    {SquidPK,Squid,DenseRank,Length,Weight};

SquidPK Squid DenseRank Length Weight
------- ----- --------- ------ ------
1       1     1         5      10    
5       1     1         4      6     
7       2     2         3      8     
9       2     2         7      3     
11      3     3         5      8     
12      3     3         6      7     
15      3     3         6      8     
17      1     4         3      5     
19      1     4         6      7     
21      2     5         8      5     
23      2     5         6      8     
25      2     5         6      8  

Grouping by Squid and the dense rank (DenseRank) allows aggregate functions
to summarize the data:

select
 (
  (DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank})
  join
   Squids
//We rename DenseRank to SquidRank in the specify statement.
    {SquidPK,Squid,DenseRank SquidRank,Length,Weight}
 )  
     group by {Squid,SquidRank}
       add{Count() Cnt,Avg(Length) AvgLen,Avg(Weight) AvgWt}
         redefine {AvgLen:=Round(AvgLen,1),AvgWt:=Round(AvgWt,1)};

Squid SquidRank Cnt AvgLen AvgWt
----- --------- --- ------ -----
1     1         2   4.5    8    
1     4         2   4.5    6    
2     2         2   5      5.5  
2     5         3   6.7    7    
3     3         3   5.7    7.7  
   
We can do cars as well as squid:

create table Cars
{
  ID :Integer,
  Car:String,
  Price:Money,
  key{ID}
};
insert
table
{
 row{ 1 ID,'Impala' Car,$18000 Price},
 row{ 2,'Impala',$19000},
 row{ 3,'Impala',$17000},
 row{ 5,'Mustang',$26000},
 row{ 7,'Mustang',$25000},
 row{ 9,'Impala',$16000},
 row{11,'Impala',$14000},
 row{13,'Mustang',$22000},
 row{17,'Mustang',$21000},
 row{19,'Mustang',$28000},
 row{23,'Mustang',$25000},
 row{29,'Impala',$20000},
 row{31,'Impala',$21000},
 row{37,'Impala',$16000}
} into Cars;

select DenseRank(Cars{ID PK,Car Grp}) {PK ID,DenseRank};
 
ID DenseRank
-- ---------
1  1        
2  1        
3  1        
5  2        
7  2        
9  3        
11 3        
13 4        
17 4        
19 4        
23 4        
29 5        
31 5        
37 5      

select
  (DenseRank(Cars{ID PK,Car Grp}) {PK ID,DenseRank})
    join
     Cars
      {ID,DenseRank,Car,Price};
 
ID DenseRank Car     Price     
-- --------- ------- ----------
1  1         Impala  $18,000.00
2  1         Impala  $19,000.00
3  1         Impala  $17,000.00
5  2         Mustang $26,000.00
7  2         Mustang $25,000.00
9  3         Impala  $16,000.00
11 3         Impala  $14,000.00
13 4         Mustang $22,000.00
17 4         Mustang $21,000.00
19 4         Mustang $28,000.00
23 4         Mustang $25,000.00
29 5         Impala  $20,000.00
31 5         Impala  $21,000.00
37 5         Impala  $16,000.00

(Note that you can't just rename columns in sql. When you rename a column
 in sql the table is recreated with the column name. This is because the
 column name is inseparable from the table name. This is a consequence of
 sql tables being literal or just a file as opposed to being a variable.)

Like any function DenseRank is only concerned with the parameter (table)
being passed. The PK column need not be a column stored in the database
as belonging to a table(s) but can be a virtual column that is generated
from an expression. The fact that the parameter is based on an expression
does not in any way invalidate the meaningfulness of the dense rank function.

So instead of Britney Spears consider the Orders table in the Northwind database.
Suppose we want a dense rank over customers. We can use an expression that
orders the table by CustomerID and within customers by OrderID. The following
expression uses the generated sequence number (renamed to PK) for the ordering
as a unique key from which we can get a dense rank over customers (note that
the DenseRank function can return a dense rank regardless of whether or not
the group column (Grp) repeats or not.)

select
  (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                     with {IgnoreUnsupported = 'true'})))
          {sequence+1 PK,OrderID,CustomerID,EmployeeID,ShipCountry});        


PK OrderID CustomerID EmployeeID ShipCountry
-- ------- ---------- ---------- -----------
1  10643   ALFKI      6          Germany    
2  10692   ALFKI      4          Germany    
3  10702   ALFKI      4          Germany    
4  10835   ALFKI      1          Germany    
5  10952   ALFKI      1          Germany    
6  11011   ALFKI      3          Germany    
7  10308   ANATR      7          Mexico     
8  10625   ANATR      3          Mexico     
9  10759   ANATR      3          Mexico     
10 10926   ANATR      4          Mexico     
11 10365   ANTON      3          Mexico     
12 10507   ANTON      7          Mexico     

Based on the expression we can get the dense rank for customers (we only
return the first 12 rows.)

select
  DenseRank(
            (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                      with {IgnoreUnsupported = 'true'})))
              {sequence+1 PK,CustomerID Grp})
           ) ;
          
PK DenseRank
-- ---------
1  1        
2  1        
3  1        
4  1        
5  1        
6  1        
7  2        
8  2        
9  2        
10 2        
11 3        
12 3               
          
And just like squid and cars we can combine the DensRank function (variable)
with the expression (variable) to work with it in any form we want (ie a memory
variable, a view or perhaps another function).

create view CustomerRanks 
//Note that procedures can be used in views.
//We could remove the PK column from the result if we wanted.
  DenseRank(
           (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                     with {IgnoreUnsupported = 'true'})))
             {sequence+1 PK,CustomerID Grp})
           )
            join  //The join is a natural join on PK.   
              (
               ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                        with {IgnoreUnsupported = 'true'}))
              )
                {sequence+1 PK,OrderID,CustomerID,EmployeeID,ShipCountry}); 

PK DenseRank OrderID CustomerID EmployeeID ShipCountry
-- --------- ------- ---------- ---------- -----------
1  1         10643   ALFKI      6          Germany    
2  1         10692   ALFKI      4          Germany    
3  1         10702   ALFKI      4          Germany    
4  1         10835   ALFKI      1          Germany    
5  1         10952   ALFKI      1          Germany    
6  1         11011   ALFKI      3          Germany    
7  2         10308   ANATR      7          Mexico     
8  2         10625   ANATR      3          Mexico     
9  2         10759   ANATR      3          Mexico     
10 2         10926   ANATR      4          Mexico     
11 3         10365   ANTON      3          Mexico     
12 3         10507   ANTON      7          Mexico     

Summary

Hopefully you can see that Joe Celko's idea of 'a magical super function
that works on Squids, Automobiles, Britney Spears, Geographical locations
or anything in the whole of creation' is not magical but very real! What
is meant by magic is simply the concept of a table as a variable and
the underlying importance of columns and their types. There is nothing
magical going on at all. The idea of a variable holding any integer or
string is just extended in the D4 language to tables. Sql users have been
trying to simulate this concept for a long time, perhaps unwittingly. The
sql use of dynamic sql is an attempt to use variable sql, the attempt
to go beyond the simple static (file) structure to use tables as variables.

Finally, in D4 when using tables as parameters to functions all such
functions can be super functions. It is but one way to turn a database
into a super database. And can you guess what type of man they call
such programmers ☺

steve
Learning D4 is like learning two systems, D4 and relearning sql all over
again. D4 will bring a new clarity to sql and will make clear when the
use of each system is most appropriate. It is not a one or the other
contest, but a learning of when to use which where. 
 

No comments:

About Me

My photo
Phoenix, Arizona, United States