Beyond Sql: August 2007

Wednesday, August 29, 2007

Dataphor - Reply to Michel

Hello Michel,

It's been a while:)
I do appreciate you taking time to explore this subject!
I hope talented and intellectually mature people like yourself
take a serious interest in this matter.

> So, keep your shields up, :-)

You have my word Kirk :)

>> I decided to take a couple of days in order to get some 'emotional distance
>> toward what you wrote, in your blog, and while I did just that, but not
>> having time to read every thing there, I am still left with an incomplete
>> picture.

Let me give you short overview of the entire subject.
A small group of very talented people, both conceptually and technically,
at Alphora implemented the relational model known as Tutorial D completely
specified in the book:
Databases, Types, and The Relational Model: The Third Manifesto, 3rd edition
by C. J. Date and Hugh Darwenhttp://www.thethirdmanifesto.com/

This relational (non-procedural database access) language is within a Pascal
(declarative/procedural) like language. The language in toto is referred to as
'D4' by alphora. The relational language is conceptually (radically) different
from sql in two fundamental ways. It rests on the concept of 'type' and
'variable'. Whereas sql is absent the concept of 'type' and 'variable'
as it relates to a table. In the absence of type there is only a 'file name',
much like there is only the value of integer 1 in the absence of the
ability to assign integer 1 to a variable. The design of the non-procedural
language is also radically different from sql. There are so many differences
with sql (conceptually, design of language, keys, use of meta-data, constraints,
views, duplicates,procedures etc) that it not possible for me to go into them
all here :) Suffice it to say we are talking qualitive differences not quantitive.

The 'relational' idea of the database is itself in the service of the
overall goal of the whole system. And that is application development.
But a very special method of AD, from procedural to declarative. In other
words, from the traditional procedural method(s) of AD to a methodology
based on 'inference'. It is as if 'performance' as the goal of an sql
optimizer with its emphasis on the physical implementation of sql was
now secondary to the 'logical' model with the emphasis on the compiler
as a logical inference mechanism. In D4 there are no query 'hints' there
are logical 'hints', index hints are replaced with 'key' hints. The user
works with the compiler on a logical level (clarifying which index to
use is replaced with what key(s),reference(s) does a resultant table have).
(As a quick aside, all joins can only be equi-joins. This insures an
unambigious key(s) for the resulting table. Note that the concept
of requiring every result/table (regardless of how it is derived) to
have a key(s) is entirely absent from sql. And yes other predicates can
be used to form tables with other relational operators (see D4 'times' operator).

The ui in Dataphor is capable of making very detailed windows into
the database exclusively based on inference (views which go way beyond
sql, references and meta-data are key components on which inferences are
based).Of course you can modify the gui to your hearts content:) I'm
hoping that the integration between the ui and tables in Access
may lead some to explore the sophisticated inference ideas in Dataphor.
I illustrate these ideas with a mini-application that can be downloaded:
http://beyondsql.blogspot.com/2007/07/dataphor-example-of-rapid-application.html


>>Your points are not about J. C., so even if you want to notify the reader
>>that you take his objections into consideration, you surely can and have to
>>mention it, but not as introduction to the point you develop. I refer to the
>>article where you ... try... to introduce the fact that a table can be a
>>parameter and you start the discussion by bringing one of the poorest answer
>>Joe Celko may have ever done in his entire life, of a mathematician (I say
>>that but take in account that I have a lot of respect for that guy, for the
>>good stuff he produced).

I respect Joe too. I hope you will forgive me if I take some liberties:)
Joe implied the logical implausibility of the idea of a 'super function'
in sql. Of course he is correct, in sql. I used that same idea to show
how logically it fits in a relational database. I used his thought
to 'contrast' sql and D4 on the basis of tables as variables. I think/hope it
worked out:) As a practical matter it is not easy to communicate
many of these ideas. Joe is a known and respected authority. Perhaps the
reader will stick around and see how things worked out:) Also bear in mind
these concepts read in a less than compelling way to most. It is hard to make
them register. That is why I try to show an actual example that derives
from concepts that must be understood for the user to make sense of the
material. I think Joe understands. I have had many exchanges with him:)
For example see:
 (
  microsoft.public.sqlserver.programming
  Tuesday, August 21, 2007
  'Primary key selection'
   http://tinyurl.com/39dgcp
(My point here is sql is immature based on its concept of a key. The key
 is elevated in D4 to where index is in sql. Note that 'all' types including
 tables have a logical 'addressing' mechanism. For tables it is the key(s).
 This concept and its usage is completely alien to sql users).
 I've also used ideas expressed by Itzik Ben-Gan to contrast sql and D4.
 See:
 http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html
 So I'm at least in fast company, yourself included :-)
 )

I use MS sql server in much the same way. Some concepts make sense
in sql (some don't:) but take on a whole new meaning in D4. Again I use
the contrast to make my point. In this vein I've used dynamic sql and
'lists' in sql server to name a few. See:
'Sql - History repeats itself simulating lists'
http://beyondsql.blogspot.com/2007/07/sql-history-repeats-itself-simulating.html

>>In my opinion, I must say that for optimization of a query plan, I still find
>>that knowing the table, its physical structure, its indexes, its stats, what
>>is required (what is SELECTed), etc. can make the optimizer find a better plan
>>than when having a ... general ...  table. Not knowing if you SELECT DISTINCT
>>a primary key or a secondary field, as example, can make some difference in
>>the lag the end user is likely to experience, at runtime. Sure, there are
>>cases where the optimized plan will be very fast to obtain (so why not
>>getting it at runtime), or where it is irrelevant, but I cannot say that
>>having a table, as parameter, is on my priority list at all.

The general issue of 'performance' will forever rear its ugly head:(:)
At this point I have little incentive to get into the details of
optimizing D4 code regardless if its queries, non-procedural code
or whatever. The basic concepts are hard enough to get across. I fear
introducing details of how certain D4 constructs will gain a performance
advantage is premature. I would ask that you look at this article for
some perspective on the notion of performance in AD/D4:
http://beyondsql.blogspot.com/2007/06/dataphor-inclusive-sql-vs-exclusive-d4.html

Eventually I will address performance head on.

Let me introduce some additional info. At this point in its development
Dataphor does not possess its own mature 'native' storage (for development
yes, for enterprise work no). Dataphor utilitizes a 'device' for
storage. A device can be one of a number of enterprise sql dbs, ie.
DB2, Oracle, SAS, Sql Server. I use Sql Server as a device. So the device
can simply be the data repository. At the most elementary level all
DDL commands in the synatax of D4 will throw the data into an sql
server database. All DML commands in D4 will, when appropriate,
access data in sql server going thru a client language interface.
The D4 compiler will determine the details of the interaction with
the device. This process is by no means trivial but at the very
least can be transparent to the user. But channels of communication
are availiable for user intervention. A most common one is what
most Access users are familiar with, the pass thru query. So the
user can throw any valid t-sql (queries, procedure calls, ddl) to
the server. For t-sql queries and server side procedures that produce
a result (table) the result will be treated just as if it was derived
via native D4 syntax. Of course there is nothing preventing a user from
using dynamic sql. So we come around in a full circle. If one becomes to
stressed in the relational world of D4 there is always the option to
to go back to the world that time forgot! :-) :-)

>> Your writing style is (still) complex. I know mine is, also, but in this
>> context, my style is not relevant, I think.

You have me at a distint disadvantage. But bear in mind sql has been
around for almost 40 years and talking about it succintly still remains
a burden few can carry. Communicating effectively in the newborn relational
terms, ie types, variables and relvars, is a work in progress. I will
have to take my lumps. But I will try to make effective use of feedback
which I wholeheartedly encourage.

>>I haven't see a 'tour guide'. I am not even sure I started at the right
>>place! Sure, if the purpose is still to start a discussion, that can be
>>expected since we don't know where it will go, but there is already a lot of
>>material (not necessary self contained, I am afraid) and I am still not sure
>>about 'where' in the process of making an application you want to focus. You
>>are not spreading your thought about the user interface, aren't you? Ok, it
>>is NOT SQL, it is maybe Access, but WHAT is it? I am a little bit lost, and
>>have the impression  that I could put my hands on great (I hope) pictures of
>>a movie, but still not sure if the movie is a documentary, or something
>>else.

I understand what you are expressing. Start at the beginning, but just
where is the beginning?
To get a jump start, to actually see what D4 scripts are and the
steps that constitute a working example/application see:
http://beyondsql.blogspot.com/2007/07/dataphor-example-of-rapid-application.html
This is a fully functioning application that includes a ui. But bear in
mind one does not need the 'ui' to explore the system. One can learn
quite a bit about modelling with sql server without the need for a ui.
Creating a single D4 operator/procedure or query will start a user on his way.
Once you start up Dataphor you can tinker with creating tables, queries
or whatever.

I would recommend the serious user:

Go to www.alphora.com and download Dataphor. Install the beast and start
pouring over the documentation to get a feel for this thing (the documentation
is also availiable online). If you have any version of sql server
you can quite easily use it as your data store. You can connect to
a server database and have access to all tables in the database immediately.
For example connect to the Northwind or Pubs database. (If you need a hand
connecting contact me via my blog). Start getting familiar with the
Dataphoria gui. Your going to be doing your development with this ui.
It's straightforward just like query analyzer is. You can easily create
new tables in D4 that will reside on sql server. Start browsing the
relational operators in help. Try a few queries. Your on your way.

My blog
Aside from the mini-app example, the articles consist of pep talks,
explanations and examples (along with their underlying concepts).
The examples can all be copied and pasted into Dataphor and run.
They work:) Think of the examples as snapshots. The show different
slices of constructs and concepts (Dataphor comes with many samples too.
Be sure to check them when you install Dataphor). Joe Celko has said that
at some point in time when learning sql a light in your head will hopefully
appear. Same thing in D4. At some point it will all come together
if you bear with it.


>>It seems you mention an interest for the framework (dot.Net) which is an
>>anti-SQL thing (even if they added some element of SQL in version 3.0),
>>imho. Gone the sub-queries, even gone the joins, just a simple
>>select-from-where and if you need 'join' or 'subqueries', you use the
>>framework objects that refer to these 'simple' statements, and use C#, or
>>whatever, not SQL, to carry the 'joins', but without the memory explosion
>>what SQL-joins do. On the other hand, from the first impression I got, it
>>seems you add complexity to the SQL language. So, it would be like rooting
>>in the wrong soil, no? Since the framework is about simple, simple SQL
>>statements, that is (and C# developers are ready to pay the price by writing
>>complex C# procedural statements, because they know C#, but not SQL, and
>>don't want to invest in a more complex SQL like language). I may have the
>>wrong idea of what you saw, though.

Dataphor is built with C# a typed language. Conceptually it makes a lot of
sense since D4 is a itself a typed system. The concept of type is one of the
foundations of the 'relational' model. In other words it is logical that
D4, with its reliance on type, be built with a framework that is typed.
The fact that D4 is built with a net language in no way implies that
D4 has anything to do with the LINQ/DLINQ project which is what you seem to
be referring to. The idea the MS can 'hide' sql from net developers.
What ever games MS is playing with sql from net has nothing to do with
D4. But it also means that just like channels of communication from D4 to
the storage(device) ie. sql server, there are channels from D4 to net.
So users can write assemblies and interact with the net framework in other
ways. And like pass-thru queries, this communication with net does not
come with any hidden price to be paid:) It can be used to expand the
functionality of D4, for example user defined types defined with a
net language (note that user defined types can be defined quite nicely
from within D4). Aside from channels of communication, the net framework
is 'transparent' to the user. Again MS's communication from net to
sql server is totally independent of D4 and its relational language.

Some parting thoughts.
I have not said anything specific about the relational (query) language.
As an expert sql developers you will find it quite different which
is to be expected. You will also find that some query concepts in
sql will carry over. You will find there are many new constructs,
you will find many sql constructs greatly expanded in functionality
(constraints, views, procedures to name just a few) and you
will discover new ways of doing things in comparison to sql. Most
of the historical objections to sql (both conceptually and design of the
language) are corrected in D4.
Mastery of D4 will take an effort, this is not a trivial exercise.
But I think you will find it is a great picture:)
If you have any questions or just want to chat please feel free to
contact me (thru the blog is fine). People like yourself will only
make this system better.
Lastly, as I state in several articles, D4 does not necessary
eclipse sql. Sql has its place but it is misplaced as a foundation
for application development. For that D4 is best suited.

best,
steve

Saturday, August 11, 2007

Dataphor - Creating a Super Function

The Concept of the Super Function

In the thread:
comp.databases.ms-sqlserver
Jul 25, 6:52 am
'Pass Table as a parameter to a function'
http://tinyurl.com/2j4dvy

Joe Celko makes the following comment on passing a table as a
parameter to a procedure:

>> Is it possible to pass a table as a parameter to a function.  <<

'Please read a book, any book, on data modeling, and RDBMS.  A table is
 an entity or a relationship.  That would mean you have a magical,
 super function that works on Squids, Automobiles, Britney Spears,
 Geographical locations or anything in the whole of creation.'

While his reply is pejorative the name 'super function' is perfectly
valid! It is actually a more catchy name for what is commonly referred
to as 'reusable code'. And such code rests on a fundamental concept:
being able to represent objects as 'variables'. In a database the
objects we are referring to are most notably tables. A table as a
variable allows us to pass it as a parameter to a function (procedure).
A procedure that takes a table as a parameter and returns a virtual
table (result) allows us to realize a procedure as a variable, a super
function. The super function allows an infinite number of representations
of the same logic (in the form of a table) for tables of the same type,
ie. tables of Squids, Automobiles and Britney Spears ☺ This is not possible
in sql since what are talking about as a variable (tables,procedures)
can only be represented as literals in sql. In sql we can only reference
a table by its name since there is no type for a table. An sql
procedure/function dependent on a table can only express logic for
that particular table and hence, like the sql table, is a literal
representation of an object (table) not a variable that can represent
an infinite number of solutions.

An example of a super function in the D4 language of Dataphor.

A common requirement is find a dense rank given that a grouping column
can have repeating values. Procedure DenseRank takes a table as a parameter
with an integer column named PK which represents the tables primary key
and a column named Grp which represents a group column. The operator returns
a table with the primary key column (PK) and an integer column which is
the dense rank (DenseRank). The DenseRank procedures calls another
procedure, IDenseRank, which takes the same DenseRank table as a parameter.
Both procedures are overloaded with the group (Grp) column of the table
defined as a string. The DenseRank super function can therefore represent
the dense rank of any table in the database that has a single integer column
that is unique (not necessarily the primary key) and a column
(either integer or string) that represents a group column. The group column
may very well represent Squids, Automobiles or Britney Spears ☺ It does
not matter as the super function is predicated on column type only. The
logic of the procedure (a dense rank) is independent of what the meaning
of the predicate (rows of the input table) is. This independence is
achieved by the concept of a what a table type is. In other words,
the dense rank super function is only possible by substituting the
significance of the type of a column(s) for the name of a column. In
sql all logical significance rests in the name of a column. The only
significance of the type of a column is computational. The significance
of type, its logical significance, is a major advancement of D4 over
sql and allows objects like super functions. The examples that follow
will make clear what these concepts mean in practice.

(If you are just beginning D4 it is more important to understand the
 the general concepts of database objects being variables and the
 importance of type than specific programming techniques. If you have
 programming questions about the procedure(s) please feel free to post
 them in comments.)

Operator (function) DenseRank returns a table (virtual) with a unique
integer column (PK) and an integer column with the dense rank (DenseRank).

create operator DenseRank(aTable:table{PK:Integer,Grp:Integer}):
                         table{PK:Integer,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
         aTable {PK}
            times
             IDenseRank(aTable)
               where PK between PKMin and PKMax
                 {PK,DenseRank};
end;               

This operator returns a single row for each dense rank. It also has the
starting (min) and ending (max) primary key values of the rank. The ranks
are consecutively numbered in case there is a query that involves a
meaningful ordering of the dense ranks (in the direction of the primary key).

create operator IDenseRank(aTable:table{PK:Integer,Grp:Integer}):
                      table{PKMin:Integer,PKMax:Integer,DenseRank:Integer}
begin
result:=table of typeof (result){}; 
result:=                  
   ToTable(ToList(cursor(
       (
        ToTable(ToList(cursor(
       //Have to get PK consecutively numbered (PKSeq).
          ToTable(ToList(cursor(aTable order by {PK} ))) {PK,Grp,sequence+1 PKSeq}
            order by {Grp,PKSeq}  ))) 
              add{PKSeq-(sequence+1) Drank}
                group by {Grp,Drank} add{Min(PK) PKMin,Max(PK) PKMax}
       )     
                  order by {PKMin}  ))) 
                    {PKMin,PKMax,sequence+1 DenseRank} ; 
end;        

Here we overload the DenseRank and IDenseRank procedures to accept a group
column (Grp) of type string. The overload allows us to essentially ignore
the type of the group column (as long as it's an integer or a string type).
The system will determine which overload of the procedures to use based on
the table (and its group column type) passed.

create operator DenseRank(aTable:table{PK:Integer,Grp:String}):
                         table{PK:Integer,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
         aTable {PK}
            times
             IDenseRank(aTable)
               where PK between PKMin and PKMax
                 {PK,DenseRank};
end;                
 
create operator IDenseRank(aTable:table{PK:Integer,Grp:String}):
                      table{PKMin:Integer,PKMax:Integer,DenseRank:Integer}
begin
result:=table of typeof (result){}; 
result:=                  
   ToTable(ToList(cursor(
       (
        ToTable(ToList(cursor(
         ToTable(ToList(cursor(aTable order by {PK} ))) {PK,Grp,sequence+1 PKSeq}
            order by {Grp,PKSeq}  ))) 
              add{PKSeq-(sequence+1) Drank}
                group by {Grp,Drank} add{Min(PK) PKMin,Max(PK) PKMax}
       )     
                  order by {PKMin}  ))) 
                    {PKMin,PKMax,sequence+1 DenseRank} ; 
end;        

(D4 will store two signatures for each procedure accounting for an integer and
 a string type for Grp. The table returned (PK and DenseRank) is exactly the
 same regardless of Grp type.)

Examples of using the DenseRank super function (the use of the term function
or procedure or operator is a matter of choice, they all mean the same thing
and here they all imply super ☺ )

(All tables use MS Sql Server 2005 as the storage device.)

Consider table Squids:

create table Squids
{
 SquidPK:Integer, //Primary key.
 Squid:Integer,   //Indicates a squid group that can repeat.
 Length:Integer,  //Length of the squid in squidly centimeters.
 Weight:Integer,  //Weight of the squid in squidly globs.
 key{SquidPK}
};
insert
table
{
row{1 SquidPK,1 Squid,5 Length,10 Weight},
row{5,1,4,6},
row{7,2,3,8},
row{9,2,7,3},
row{11,3,5,8},
row{12,3,6,7},
row{15,3,6,8},
row{17,1,3,5},
row{19,1,6,7},
row{21,2,8,5},
row{23,2,6,8},
row{25,2,6,8}
} into Squids;

select Squids;

SquidPK Squid Length Weight
------- ----- ------ ------
1       1     5      10    
5       1     4      6     
7       2     3      8     
9       2     7      3     
11      3     5      8     
12      3     6      7     
15      3     6      8     
17      1     3      5     
19      1     6      7     
21      2     8      5     
23      2     6      8     
25      2     6      8     

To obtain the dense rank we need:
 1. compatibility (type) of columns SquidPK and Squid with the columns of
    the generic table parameter aTable (PK,Grp) which we have.
 2. the elimination of all columns except the primary key and group.
 3. header agreement (column name equivalency) between the table passed
    and the generic names of the columns in the function.
   
We can accomplish 2 and 3 using remove and rename and obtain the ranks with:

select DenseRank(Squids remove{Length,Weight} rename {SquidPK PK,Squid Grp});

We can even combine the remove and replace into one statement using specify { }.

select DenseRank(Squids{SquidPK PK,Squid Grp});

PK DenseRank
-- ---------
1  1        
5  1        
7  2        
9  2        
11 3        
12 3        
15 3        
17 4        
19 4        
21 5        
23 5        
25 5        

What we have is the variable DenseRank representing the dense ranks of
Squids. But the variable is still generic in the sense that PK is not
meaningful for table Squids. So we simply reverse the process used in
calling the function, we rename the PK column to the primary key column
in Squids (SquidPK):

select DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank}; 

SquidPK DenseRank
------- ---------
1       1        
5       1        
7       2        
9       2        
11      3        
12      3        
15      3        
17      4        
19      4        
21      5        
23      5        
25      5        

By joining the Squids table to variable DenseRank we can represent
all the data of Squids along with its dense rank:

select
 (DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank})
//A natural join using the primary key (SquidPK) from tables DenseRank and Squids.
  join
   Squids
    {SquidPK,Squid,DenseRank,Length,Weight};

SquidPK Squid DenseRank Length Weight
------- ----- --------- ------ ------
1       1     1         5      10    
5       1     1         4      6     
7       2     2         3      8     
9       2     2         7      3     
11      3     3         5      8     
12      3     3         6      7     
15      3     3         6      8     
17      1     4         3      5     
19      1     4         6      7     
21      2     5         8      5     
23      2     5         6      8     
25      2     5         6      8  

Grouping by Squid and the dense rank (DenseRank) allows aggregate functions
to summarize the data:

select
 (
  (DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank})
  join
   Squids
//We rename DenseRank to SquidRank in the specify statement.
    {SquidPK,Squid,DenseRank SquidRank,Length,Weight}
 )  
     group by {Squid,SquidRank}
       add{Count() Cnt,Avg(Length) AvgLen,Avg(Weight) AvgWt}
         redefine {AvgLen:=Round(AvgLen,1),AvgWt:=Round(AvgWt,1)};

Squid SquidRank Cnt AvgLen AvgWt
----- --------- --- ------ -----
1     1         2   4.5    8    
1     4         2   4.5    6    
2     2         2   5      5.5  
2     5         3   6.7    7    
3     3         3   5.7    7.7  
   
We can do cars as well as squid:

create table Cars
{
  ID :Integer,
  Car:String,
  Price:Money,
  key{ID}
};
insert
table
{
 row{ 1 ID,'Impala' Car,$18000 Price},
 row{ 2,'Impala',$19000},
 row{ 3,'Impala',$17000},
 row{ 5,'Mustang',$26000},
 row{ 7,'Mustang',$25000},
 row{ 9,'Impala',$16000},
 row{11,'Impala',$14000},
 row{13,'Mustang',$22000},
 row{17,'Mustang',$21000},
 row{19,'Mustang',$28000},
 row{23,'Mustang',$25000},
 row{29,'Impala',$20000},
 row{31,'Impala',$21000},
 row{37,'Impala',$16000}
} into Cars;

select DenseRank(Cars{ID PK,Car Grp}) {PK ID,DenseRank};
 
ID DenseRank
-- ---------
1  1        
2  1        
3  1        
5  2        
7  2        
9  3        
11 3        
13 4        
17 4        
19 4        
23 4        
29 5        
31 5        
37 5      

select
  (DenseRank(Cars{ID PK,Car Grp}) {PK ID,DenseRank})
    join
     Cars
      {ID,DenseRank,Car,Price};
 
ID DenseRank Car     Price     
-- --------- ------- ----------
1  1         Impala  $18,000.00
2  1         Impala  $19,000.00
3  1         Impala  $17,000.00
5  2         Mustang $26,000.00
7  2         Mustang $25,000.00
9  3         Impala  $16,000.00
11 3         Impala  $14,000.00
13 4         Mustang $22,000.00
17 4         Mustang $21,000.00
19 4         Mustang $28,000.00
23 4         Mustang $25,000.00
29 5         Impala  $20,000.00
31 5         Impala  $21,000.00
37 5         Impala  $16,000.00

(Note that you can't just rename columns in sql. When you rename a column
 in sql the table is recreated with the column name. This is because the
 column name is inseparable from the table name. This is a consequence of
 sql tables being literal or just a file as opposed to being a variable.)

Like any function DenseRank is only concerned with the parameter (table)
being passed. The PK column need not be a column stored in the database
as belonging to a table(s) but can be a virtual column that is generated
from an expression. The fact that the parameter is based on an expression
does not in any way invalidate the meaningfulness of the dense rank function.

So instead of Britney Spears consider the Orders table in the Northwind database.
Suppose we want a dense rank over customers. We can use an expression that
orders the table by CustomerID and within customers by OrderID. The following
expression uses the generated sequence number (renamed to PK) for the ordering
as a unique key from which we can get a dense rank over customers (note that
the DenseRank function can return a dense rank regardless of whether or not
the group column (Grp) repeats or not.)

select
  (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                     with {IgnoreUnsupported = 'true'})))
          {sequence+1 PK,OrderID,CustomerID,EmployeeID,ShipCountry});        


PK OrderID CustomerID EmployeeID ShipCountry
-- ------- ---------- ---------- -----------
1  10643   ALFKI      6          Germany    
2  10692   ALFKI      4          Germany    
3  10702   ALFKI      4          Germany    
4  10835   ALFKI      1          Germany    
5  10952   ALFKI      1          Germany    
6  11011   ALFKI      3          Germany    
7  10308   ANATR      7          Mexico     
8  10625   ANATR      3          Mexico     
9  10759   ANATR      3          Mexico     
10 10926   ANATR      4          Mexico     
11 10365   ANTON      3          Mexico     
12 10507   ANTON      7          Mexico     

Based on the expression we can get the dense rank for customers (we only
return the first 12 rows.)

select
  DenseRank(
            (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                      with {IgnoreUnsupported = 'true'})))
              {sequence+1 PK,CustomerID Grp})
           ) ;
          
PK DenseRank
-- ---------
1  1        
2  1        
3  1        
4  1        
5  1        
6  1        
7  2        
8  2        
9  2        
10 2        
11 3        
12 3               
          
And just like squid and cars we can combine the DensRank function (variable)
with the expression (variable) to work with it in any form we want (ie a memory
variable, a view or perhaps another function).

create view CustomerRanks 
//Note that procedures can be used in views.
//We could remove the PK column from the result if we wanted.
  DenseRank(
           (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                     with {IgnoreUnsupported = 'true'})))
             {sequence+1 PK,CustomerID Grp})
           )
            join  //The join is a natural join on PK.   
              (
               ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID}
                                        with {IgnoreUnsupported = 'true'}))
              )
                {sequence+1 PK,OrderID,CustomerID,EmployeeID,ShipCountry}); 

PK DenseRank OrderID CustomerID EmployeeID ShipCountry
-- --------- ------- ---------- ---------- -----------
1  1         10643   ALFKI      6          Germany    
2  1         10692   ALFKI      4          Germany    
3  1         10702   ALFKI      4          Germany    
4  1         10835   ALFKI      1          Germany    
5  1         10952   ALFKI      1          Germany    
6  1         11011   ALFKI      3          Germany    
7  2         10308   ANATR      7          Mexico     
8  2         10625   ANATR      3          Mexico     
9  2         10759   ANATR      3          Mexico     
10 2         10926   ANATR      4          Mexico     
11 3         10365   ANTON      3          Mexico     
12 3         10507   ANTON      7          Mexico     

Summary

Hopefully you can see that Joe Celko's idea of 'a magical super function
that works on Squids, Automobiles, Britney Spears, Geographical locations
or anything in the whole of creation' is not magical but very real! What
is meant by magic is simply the concept of a table as a variable and
the underlying importance of columns and their types. There is nothing
magical going on at all. The idea of a variable holding any integer or
string is just extended in the D4 language to tables. Sql users have been
trying to simulate this concept for a long time, perhaps unwittingly. The
sql use of dynamic sql is an attempt to use variable sql, the attempt
to go beyond the simple static (file) structure to use tables as variables.

Finally, in D4 when using tables as parameters to functions all such
functions can be super functions. It is but one way to turn a database
into a super database. And can you guess what type of man they call
such programmers ☺

steve

Learning D4 is like learning two systems, D4 and relearning sql all over
again. D4 will bring a new clarity to sql and will make clear when the
use of each system is most appropriate. It is not a one or the other
contest, but a learning of when to use which where.

Wednesday, August 29, 2007

Dataphor - Reply to Michel

Saturday, August 11, 2007

Dataphor - Creating a Super Function

Beyond Sql

Blog Archive

Links

About Me