Hello Michel, It's been a while:) I do appreciate you taking time to explore this subject! I hope talented and intellectually mature people like yourself take a serious interest in this matter. > So, keep your shields up, :-) You have my word Kirk :) >> I decided to take a couple of days in order to get some 'emotional distance >> toward what you wrote, in your blog, and while I did just that, but not >> having time to read every thing there, I am still left with an incomplete >> picture. Let me give you short overview of the entire subject. A small group of very talented people, both conceptually and technically, at Alphora implemented the relational model known as Tutorial D completely specified in the book: Databases, Types, and The Relational Model: The Third Manifesto, 3rd edition by C. J. Date and Hugh Darwenhttp://www.thethirdmanifesto.com/This relational (non-procedural database access) language is within a Pascal (declarative/procedural) like language. The language in toto is referred to as 'D4' by alphora. The relational language is conceptually (radically) different from sql in two fundamental ways. It rests on the concept of 'type' and 'variable'. Whereas sql is absent the concept of 'type' and 'variable' as it relates to a table. In the absence of type there is only a 'file name', much like there is only the value of integer 1 in the absence of the ability to assign integer 1 to a variable. The design of the non-procedural language is also radically different from sql. There are so many differences with sql (conceptually, design of language, keys, use of meta-data, constraints, views, duplicates,procedures etc) that it not possible for me to go into them all here :) Suffice it to say we are talking qualitive differences not quantitive. The 'relational' idea of the database is itself in the service of the overall goal of the whole system. And that is application development. But a very special method of AD, from procedural to declarative. In other words, from the traditional procedural method(s) of AD to a methodology based on 'inference'. It is as if 'performance' as the goal of an sql optimizer with its emphasis on the physical implementation of sql was now secondary to the 'logical' model with the emphasis on the compiler as a logical inference mechanism. In D4 there are no query 'hints' there are logical 'hints', index hints are replaced with 'key' hints. The user works with the compiler on a logical level (clarifying which index to use is replaced with what key(s),reference(s) does a resultant table have). (As a quick aside, all joins can only be equi-joins. This insures an unambigious key(s) for the resulting table. Note that the concept of requiring every result/table (regardless of how it is derived) to have a key(s) is entirely absent from sql. And yes other predicates can be used to form tables with other relational operators (see D4 'times' operator). The ui in Dataphor is capable of making very detailed windows into the database exclusively based on inference (views which go way beyond sql, references and meta-data are key components on which inferences are based).Of course you can modify the gui to your hearts content:) I'm hoping that the integration between the ui and tables in Access may lead some to explore the sophisticated inference ideas in Dataphor. I illustrate these ideas with a mini-application that can be downloaded: http://beyondsql.blogspot.com/2007/07/dataphor-example-of-rapid-application.html >>Your points are not about J. C., so even if you want to notify the reader >>that you take his objections into consideration, you surely can and have to >>mention it, but not as introduction to the point you develop. I refer to the >>article where you ... try... to introduce the fact that a table can be a >>parameter and you start the discussion by bringing one of the poorest answer >>Joe Celko may have ever done in his entire life, of a mathematician (I say >>that but take in account that I have a lot of respect for that guy, for the >>good stuff he produced). I respect Joe too. I hope you will forgive me if I take some liberties:) Joe implied the logical implausibility of the idea of a 'super function' in sql. Of course he is correct, in sql. I used that same idea to show how logically it fits in a relational database. I used his thought to 'contrast' sql and D4 on the basis of tables as variables. I think/hope it worked out:) As a practical matter it is not easy to communicate many of these ideas. Joe is a known and respected authority. Perhaps the reader will stick around and see how things worked out:) Also bear in mind these concepts read in a less than compelling way to most. It is hard to make them register. That is why I try to show an actual example that derives from concepts that must be understood for the user to make sense of the material. I think Joe understands. I have had many exchanges with him:) For example see: ( microsoft.public.sqlserver.programming Tuesday, August 21, 2007 'Primary key selection' http://tinyurl.com/39dgcp (My point here is sql is immature based on its concept of a key. The key is elevated in D4 to where index is in sql. Note that 'all' types including tables have a logical 'addressing' mechanism. For tables it is the key(s). This concept and its usage is completely alien to sql users). I've also used ideas expressed by Itzik Ben-Gan to contrast sql and D4. See: http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html So I'm at least in fast company, yourself included :-) ) I use MS sql server in much the same way. Some concepts make sense in sql (some don't:) but take on a whole new meaning in D4. Again I use the contrast to make my point. In this vein I've used dynamic sql and 'lists' in sql server to name a few. See: 'Sql - History repeats itself simulating lists' http://beyondsql.blogspot.com/2007/07/sql-history-repeats-itself-simulating.html >>In my opinion, I must say that for optimization of a query plan, I still find >>that knowing the table, its physical structure, its indexes, its stats, what >>is required (what is SELECTed), etc. can make the optimizer find a better plan >>than when having a ... general ... table. Not knowing if you SELECT DISTINCT >>a primary key or a secondary field, as example, can make some difference in >>the lag the end user is likely to experience, at runtime. Sure, there are >>cases where the optimized plan will be very fast to obtain (so why not >>getting it at runtime), or where it is irrelevant, but I cannot say that >>having a table, as parameter, is on my priority list at all. The general issue of 'performance' will forever rear its ugly head:(:) At this point I have little incentive to get into the details of optimizing D4 code regardless if its queries, non-procedural code or whatever. The basic concepts are hard enough to get across. I fear introducing details of how certain D4 constructs will gain a performance advantage is premature. I would ask that you look at this article for some perspective on the notion of performance in AD/D4: http://beyondsql.blogspot.com/2007/06/dataphor-inclusive-sql-vs-exclusive-d4.html Eventually I will address performance head on. Let me introduce some additional info. At this point in its development Dataphor does not possess its own mature 'native' storage (for development yes, for enterprise work no). Dataphor utilitizes a 'device' for storage. A device can be one of a number of enterprise sql dbs, ie. DB2, Oracle, SAS, Sql Server. I use Sql Server as a device. So the device can simply be the data repository. At the most elementary level all DDL commands in the synatax of D4 will throw the data into an sql server database. All DML commands in D4 will, when appropriate, access data in sql server going thru a client language interface. The D4 compiler will determine the details of the interaction with the device. This process is by no means trivial but at the very least can be transparent to the user. But channels of communication are availiable for user intervention. A most common one is what most Access users are familiar with, the pass thru query. So the user can throw any valid t-sql (queries, procedure calls, ddl) to the server. For t-sql queries and server side procedures that produce a result (table) the result will be treated just as if it was derived via native D4 syntax. Of course there is nothing preventing a user from using dynamic sql. So we come around in a full circle. If one becomes to stressed in the relational world of D4 there is always the option to to go back to the world that time forgot! :-) :-) >> Your writing style is (still) complex. I know mine is, also, but in this >> context, my style is not relevant, I think. You have me at a distint disadvantage. But bear in mind sql has been around for almost 40 years and talking about it succintly still remains a burden few can carry. Communicating effectively in the newborn relational terms, ie types, variables and relvars, is a work in progress. I will have to take my lumps. But I will try to make effective use of feedback which I wholeheartedly encourage. >>I haven't see a 'tour guide'. I am not even sure I started at the right >>place! Sure, if the purpose is still to start a discussion, that can be >>expected since we don't know where it will go, but there is already a lot of >>material (not necessary self contained, I am afraid) and I am still not sure >>about 'where' in the process of making an application you want to focus. You >>are not spreading your thought about the user interface, aren't you? Ok, it >>is NOT SQL, it is maybe Access, but WHAT is it? I am a little bit lost, and >>have the impression that I could put my hands on great (I hope) pictures of >>a movie, but still not sure if the movie is a documentary, or something >>else. I understand what you are expressing. Start at the beginning, but just where is the beginning? To get a jump start, to actually see what D4 scripts are and the steps that constitute a working example/application see: http://beyondsql.blogspot.com/2007/07/dataphor-example-of-rapid-application.html This is a fully functioning application that includes a ui. But bear in mind one does not need the 'ui' to explore the system. One can learn quite a bit about modelling with sql server without the need for a ui. Creating a single D4 operator/procedure or query will start a user on his way. Once you start up Dataphor you can tinker with creating tables, queries or whatever. I would recommend the serious user: Go to www.alphora.com and download Dataphor. Install the beast and start pouring over the documentation to get a feel for this thing (the documentation is also availiable online). If you have any version of sql server you can quite easily use it as your data store. You can connect to a server database and have access to all tables in the database immediately. For example connect to the Northwind or Pubs database. (If you need a hand connecting contact me via my blog). Start getting familiar with the Dataphoria gui. Your going to be doing your development with this ui. It's straightforward just like query analyzer is. You can easily create new tables in D4 that will reside on sql server. Start browsing the relational operators in help. Try a few queries. Your on your way. My blog Aside from the mini-app example, the articles consist of pep talks, explanations and examples (along with their underlying concepts). The examples can all be copied and pasted into Dataphor and run. They work:) Think of the examples as snapshots. The show different slices of constructs and concepts (Dataphor comes with many samples too. Be sure to check them when you install Dataphor). Joe Celko has said that at some point in time when learning sql a light in your head will hopefully appear. Same thing in D4. At some point it will all come together if you bear with it. >>It seems you mention an interest for the framework (dot.Net) which is an >>anti-SQL thing (even if they added some element of SQL in version 3.0), >>imho. Gone the sub-queries, even gone the joins, just a simple >>select-from-where and if you need 'join' or 'subqueries', you use the >>framework objects that refer to these 'simple' statements, and use C#, or >>whatever, not SQL, to carry the 'joins', but without the memory explosion >>what SQL-joins do. On the other hand, from the first impression I got, it >>seems you add complexity to the SQL language. So, it would be like rooting >>in the wrong soil, no? Since the framework is about simple, simple SQL >>statements, that is (and C# developers are ready to pay the price by writing >>complex C# procedural statements, because they know C#, but not SQL, and >>don't want to invest in a more complex SQL like language). I may have the >>wrong idea of what you saw, though. Dataphor is built with C# a typed language. Conceptually it makes a lot of sense since D4 is a itself a typed system. The concept of type is one of the foundations of the 'relational' model. In other words it is logical that D4, with its reliance on type, be built with a framework that is typed. The fact that D4 is built with a net language in no way implies that D4 has anything to do with the LINQ/DLINQ project which is what you seem to be referring to. The idea the MS can 'hide' sql from net developers. What ever games MS is playing with sql from net has nothing to do with D4. But it also means that just like channels of communication from D4 to the storage(device) ie. sql server, there are channels from D4 to net. So users can write assemblies and interact with the net framework in other ways. And like pass-thru queries, this communication with net does not come with any hidden price to be paid:) It can be used to expand the functionality of D4, for example user defined types defined with a net language (note that user defined types can be defined quite nicely from within D4). Aside from channels of communication, the net framework is 'transparent' to the user. Again MS's communication from net to sql server is totally independent of D4 and its relational language. Some parting thoughts. I have not said anything specific about the relational (query) language. As an expert sql developers you will find it quite different which is to be expected. You will also find that some query concepts in sql will carry over. You will find there are many new constructs, you will find many sql constructs greatly expanded in functionality (constraints, views, procedures to name just a few) and you will discover new ways of doing things in comparison to sql. Most of the historical objections to sql (both conceptually and design of the language) are corrected in D4. Mastery of D4 will take an effort, this is not a trivial exercise. But I think you will find it is a great picture:) If you have any questions or just want to chat please feel free to contact me (thru the blog is fine). People like yourself will only make this system better. Lastly, as I state in several articles, D4 does not necessary eclipse sql. Sql has its place but it is misplaced as a foundation for application development. For that D4 is best suited. best, steve
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Wednesday, August 29, 2007
Dataphor - Reply to Michel
Saturday, August 11, 2007
Dataphor - Creating a Super Function
The Concept of the Super Function In the thread: comp.databases.ms-sqlserver Jul 25, 6:52 am 'Pass Table as a parameter to a function' http://tinyurl.com/2j4dvy Joe Celko makes the following comment on passing a table as a parameter to a procedure: >> Is it possible to pass a table as a parameter to a function. << 'Please read a book, any book, on data modeling, and RDBMS. A table is an entity or a relationship. That would mean you have a magical, super function that works on Squids, Automobiles, Britney Spears, Geographical locations or anything in the whole of creation.' While his reply is pejorative the name 'super function' is perfectly valid! It is actually a more catchy name for what is commonly referred to as 'reusable code'. And such code rests on a fundamental concept: being able to represent objects as 'variables'. In a database the objects we are referring to are most notably tables. A table as a variable allows us to pass it as a parameter to a function (procedure). A procedure that takes a table as a parameter and returns a virtual table (result) allows us to realize a procedure as a variable, a super function. The super function allows an infinite number of representations of the same logic (in the form of a table) for tables of the same type, ie. tables of Squids, Automobiles and Britney Spears ☺ This is not possible in sql since what are talking about as a variable (tables,procedures) can only be represented as literals in sql. In sql we can only reference a table by its name since there is no type for a table. An sql procedure/function dependent on a table can only express logic for that particular table and hence, like the sql table, is a literal representation of an object (table) not a variable that can represent an infinite number of solutions. An example of a super function in the D4 language of Dataphor. A common requirement is find a dense rank given that a grouping column can have repeating values. Procedure DenseRank takes a table as a parameter with an integer column named PK which represents the tables primary key and a column named Grp which represents a group column. The operator returns a table with the primary key column (PK) and an integer column which is the dense rank (DenseRank). The DenseRank procedures calls another procedure, IDenseRank, which takes the same DenseRank table as a parameter. Both procedures are overloaded with the group (Grp) column of the table defined as a string. The DenseRank super function can therefore represent the dense rank of any table in the database that has a single integer column that is unique (not necessarily the primary key) and a column (either integer or string) that represents a group column. The group column may very well represent Squids, Automobiles or Britney Spears ☺ It does not matter as the super function is predicated on column type only. The logic of the procedure (a dense rank) is independent of what the meaning of the predicate (rows of the input table) is. This independence is achieved by the concept of a what a table type is. In other words, the dense rank super function is only possible by substituting the significance of the type of a column(s) for the name of a column. In sql all logical significance rests in the name of a column. The only significance of the type of a column is computational. The significance of type, its logical significance, is a major advancement of D4 over sql and allows objects like super functions. The examples that follow will make clear what these concepts mean in practice. (If you are just beginning D4 it is more important to understand the the general concepts of database objects being variables and the importance of type than specific programming techniques. If you have programming questions about the procedure(s) please feel free to post them in comments.) Operator (function) DenseRank returns a table (virtual) with a unique integer column (PK) and an integer column with the dense rank (DenseRank). create operator DenseRank(aTable:table{PK:Integer,Grp:Integer}): table{PK:Integer,DenseRank:Integer} begin result:= table of typeof (result){}; result:= aTable {PK} times IDenseRank(aTable) where PK between PKMin and PKMax {PK,DenseRank}; end; This operator returns a single row for each dense rank. It also has the starting (min) and ending (max) primary key values of the rank. The ranks are consecutively numbered in case there is a query that involves a meaningful ordering of the dense ranks (in the direction of the primary key). create operator IDenseRank(aTable:table{PK:Integer,Grp:Integer}): table{PKMin:Integer,PKMax:Integer,DenseRank:Integer} begin result:=table of typeof (result){}; result:= ToTable(ToList(cursor( ( ToTable(ToList(cursor( //Have to get PK consecutively numbered (PKSeq). ToTable(ToList(cursor(aTable order by {PK} ))) {PK,Grp,sequence+1 PKSeq} order by {Grp,PKSeq} ))) add{PKSeq-(sequence+1) Drank} group by {Grp,Drank} add{Min(PK) PKMin,Max(PK) PKMax} ) order by {PKMin} ))) {PKMin,PKMax,sequence+1 DenseRank} ; end; Here we overload the DenseRank and IDenseRank procedures to accept a group column (Grp) of type string. The overload allows us to essentially ignore the type of the group column (as long as it's an integer or a string type). The system will determine which overload of the procedures to use based on the table (and its group column type) passed. create operator DenseRank(aTable:table{PK:Integer,Grp:String}): table{PK:Integer,DenseRank:Integer} begin result:= table of typeof (result){}; result:= aTable {PK} times IDenseRank(aTable) where PK between PKMin and PKMax {PK,DenseRank}; end; create operator IDenseRank(aTable:table{PK:Integer,Grp:String}): table{PKMin:Integer,PKMax:Integer,DenseRank:Integer} begin result:=table of typeof (result){}; result:= ToTable(ToList(cursor( ( ToTable(ToList(cursor( ToTable(ToList(cursor(aTable order by {PK} ))) {PK,Grp,sequence+1 PKSeq} order by {Grp,PKSeq} ))) add{PKSeq-(sequence+1) Drank} group by {Grp,Drank} add{Min(PK) PKMin,Max(PK) PKMax} ) order by {PKMin} ))) {PKMin,PKMax,sequence+1 DenseRank} ; end; (D4 will store two signatures for each procedure accounting for an integer and a string type for Grp. The table returned (PK and DenseRank) is exactly the same regardless of Grp type.) Examples of using the DenseRank super function (the use of the term function or procedure or operator is a matter of choice, they all mean the same thing and here they all imply super ☺ ) (All tables use MS Sql Server 2005 as the storage device.) Consider table Squids: create table Squids { SquidPK:Integer, //Primary key. Squid:Integer, //Indicates a squid group that can repeat. Length:Integer, //Length of the squid in squidly centimeters. Weight:Integer, //Weight of the squid in squidly globs. key{SquidPK} }; insert table { row{1 SquidPK,1 Squid,5 Length,10 Weight}, row{5,1,4,6}, row{7,2,3,8}, row{9,2,7,3}, row{11,3,5,8}, row{12,3,6,7}, row{15,3,6,8}, row{17,1,3,5}, row{19,1,6,7}, row{21,2,8,5}, row{23,2,6,8}, row{25,2,6,8} } into Squids; select Squids; SquidPK Squid Length Weight ------- ----- ------ ------ 1 1 5 10 5 1 4 6 7 2 3 8 9 2 7 3 11 3 5 8 12 3 6 7 15 3 6 8 17 1 3 5 19 1 6 7 21 2 8 5 23 2 6 8 25 2 6 8 To obtain the dense rank we need: 1. compatibility (type) of columns SquidPK and Squid with the columns of the generic table parameter aTable (PK,Grp) which we have. 2. the elimination of all columns except the primary key and group. 3. header agreement (column name equivalency) between the table passed and the generic names of the columns in the function. We can accomplish 2 and 3 using remove and rename and obtain the ranks with: select DenseRank(Squids remove{Length,Weight} rename {SquidPK PK,Squid Grp}); We can even combine the remove and replace into one statement using specify { }. select DenseRank(Squids{SquidPK PK,Squid Grp}); PK DenseRank -- --------- 1 1 5 1 7 2 9 2 11 3 12 3 15 3 17 4 19 4 21 5 23 5 25 5 What we have is the variable DenseRank representing the dense ranks of Squids. But the variable is still generic in the sense that PK is not meaningful for table Squids. So we simply reverse the process used in calling the function, we rename the PK column to the primary key column in Squids (SquidPK): select DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank}; SquidPK DenseRank ------- --------- 1 1 5 1 7 2 9 2 11 3 12 3 15 3 17 4 19 4 21 5 23 5 25 5 By joining the Squids table to variable DenseRank we can represent all the data of Squids along with its dense rank: select (DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank}) //A natural join using the primary key (SquidPK) from tables DenseRank and Squids. join Squids {SquidPK,Squid,DenseRank,Length,Weight}; SquidPK Squid DenseRank Length Weight ------- ----- --------- ------ ------ 1 1 1 5 10 5 1 1 4 6 7 2 2 3 8 9 2 2 7 3 11 3 3 5 8 12 3 3 6 7 15 3 3 6 8 17 1 4 3 5 19 1 4 6 7 21 2 5 8 5 23 2 5 6 8 25 2 5 6 8 Grouping by Squid and the dense rank (DenseRank) allows aggregate functions to summarize the data: select ( (DenseRank(Squids{SquidPK PK,Squid Grp}) {PK SquidPK,DenseRank}) join Squids //We rename DenseRank to SquidRank in the specify statement. {SquidPK,Squid,DenseRank SquidRank,Length,Weight} ) group by {Squid,SquidRank} add{Count() Cnt,Avg(Length) AvgLen,Avg(Weight) AvgWt} redefine {AvgLen:=Round(AvgLen,1),AvgWt:=Round(AvgWt,1)}; Squid SquidRank Cnt AvgLen AvgWt ----- --------- --- ------ ----- 1 1 2 4.5 8 1 4 2 4.5 6 2 2 2 5 5.5 2 5 3 6.7 7 3 3 3 5.7 7.7 We can do cars as well as squid: create table Cars { ID :Integer, Car:String, Price:Money, key{ID} }; insert table { row{ 1 ID,'Impala' Car,$18000 Price}, row{ 2,'Impala',$19000}, row{ 3,'Impala',$17000}, row{ 5,'Mustang',$26000}, row{ 7,'Mustang',$25000}, row{ 9,'Impala',$16000}, row{11,'Impala',$14000}, row{13,'Mustang',$22000}, row{17,'Mustang',$21000}, row{19,'Mustang',$28000}, row{23,'Mustang',$25000}, row{29,'Impala',$20000}, row{31,'Impala',$21000}, row{37,'Impala',$16000} } into Cars; select DenseRank(Cars{ID PK,Car Grp}) {PK ID,DenseRank}; ID DenseRank -- --------- 1 1 2 1 3 1 5 2 7 2 9 3 11 3 13 4 17 4 19 4 23 4 29 5 31 5 37 5 select (DenseRank(Cars{ID PK,Car Grp}) {PK ID,DenseRank}) join Cars {ID,DenseRank,Car,Price}; ID DenseRank Car Price -- --------- ------- ---------- 1 1 Impala $18,000.00 2 1 Impala $19,000.00 3 1 Impala $17,000.00 5 2 Mustang $26,000.00 7 2 Mustang $25,000.00 9 3 Impala $16,000.00 11 3 Impala $14,000.00 13 4 Mustang $22,000.00 17 4 Mustang $21,000.00 19 4 Mustang $28,000.00 23 4 Mustang $25,000.00 29 5 Impala $20,000.00 31 5 Impala $21,000.00 37 5 Impala $16,000.00 (Note that you can't just rename columns in sql. When you rename a column in sql the table is recreated with the column name. This is because the column name is inseparable from the table name. This is a consequence of sql tables being literal or just a file as opposed to being a variable.) Like any function DenseRank is only concerned with the parameter (table) being passed. The PK column need not be a column stored in the database as belonging to a table(s) but can be a virtual column that is generated from an expression. The fact that the parameter is based on an expression does not in any way invalidate the meaningfulness of the dense rank function. So instead of Britney Spears consider the Orders table in the Northwind database. Suppose we want a dense rank over customers. We can use an expression that orders the table by CustomerID and within customers by OrderID. The following expression uses the generated sequence number (renamed to PK) for the ordering as a unique key from which we can get a dense rank over customers (note that the DenseRank function can return a dense rank regardless of whether or not the group column (Grp) repeats or not.) select (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID} with {IgnoreUnsupported = 'true'}))) {sequence+1 PK,OrderID,CustomerID,EmployeeID,ShipCountry}); PK OrderID CustomerID EmployeeID ShipCountry -- ------- ---------- ---------- ----------- 1 10643 ALFKI 6 Germany 2 10692 ALFKI 4 Germany 3 10702 ALFKI 4 Germany 4 10835 ALFKI 1 Germany 5 10952 ALFKI 1 Germany 6 11011 ALFKI 3 Germany 7 10308 ANATR 7 Mexico 8 10625 ANATR 3 Mexico 9 10759 ANATR 3 Mexico 10 10926 ANATR 4 Mexico 11 10365 ANTON 3 Mexico 12 10507 ANTON 7 Mexico Based on the expression we can get the dense rank for customers (we only return the first 12 rows.) select DenseRank( (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID} with {IgnoreUnsupported = 'true'}))) {sequence+1 PK,CustomerID Grp}) ) ; PK DenseRank -- --------- 1 1 2 1 3 1 4 1 5 1 6 1 7 2 8 2 9 2 10 2 11 3 12 3 And just like squid and cars we can combine the DensRank function (variable) with the expression (variable) to work with it in any form we want (ie a memory variable, a view or perhaps another function). create view CustomerRanks //Note that procedures can be used in views. //We could remove the PK column from the result if we wanted. DenseRank( (ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID} with {IgnoreUnsupported = 'true'}))) {sequence+1 PK,CustomerID Grp}) ) join //The join is a natural join on PK. ( ToTable(ToList(cursor(Orders return 12 by {CustomerID,OrderID} with {IgnoreUnsupported = 'true'})) ) {sequence+1 PK,OrderID,CustomerID,EmployeeID,ShipCountry}); PK DenseRank OrderID CustomerID EmployeeID ShipCountry -- --------- ------- ---------- ---------- ----------- 1 1 10643 ALFKI 6 Germany 2 1 10692 ALFKI 4 Germany 3 1 10702 ALFKI 4 Germany 4 1 10835 ALFKI 1 Germany 5 1 10952 ALFKI 1 Germany 6 1 11011 ALFKI 3 Germany 7 2 10308 ANATR 7 Mexico 8 2 10625 ANATR 3 Mexico 9 2 10759 ANATR 3 Mexico 10 2 10926 ANATR 4 Mexico 11 3 10365 ANTON 3 Mexico 12 3 10507 ANTON 7 Mexico Summary Hopefully you can see that Joe Celko's idea of 'a magical super function that works on Squids, Automobiles, Britney Spears, Geographical locations or anything in the whole of creation' is not magical but very real! What is meant by magic is simply the concept of a table as a variable and the underlying importance of columns and their types. There is nothing magical going on at all. The idea of a variable holding any integer or string is just extended in the D4 language to tables. Sql users have been trying to simulate this concept for a long time, perhaps unwittingly. The sql use of dynamic sql is an attempt to use variable sql, the attempt to go beyond the simple static (file) structure to use tables as variables. Finally, in D4 when using tables as parameters to functions all such functions can be super functions. It is but one way to turn a database into a super database. And can you guess what type of man they call such programmers ☺ steve
Learning D4 is like learning two systems, D4 and relearning sql all over again. D4 will bring a new clarity to sql and will make clear when the use of each system is most appropriate. It is not a one or the other contest, but a learning of when to use which where.
Subscribe to:
Posts (Atom)