This article extends the functionality of the Dense Rank function described in: http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html For background on the concepts involved see: http://beyondsql.blogspot.com/2007/06/dataphor-13-passing-table-as-parameter.html http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html Sql Server 2005 Bol, Ranking Functions: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm
The original DenseRank operator was based on a table with a single primary key. We now extend the usability of the operator by overloading it to return a dense rank for a compound primary key, ie. a primary key consisting of two columns. As an example consider the problem posed in the thread: microsoft.public.sqlserver.programming Thursday, September 13, 2007 2:27 PM SQL Query Help: Need to remove consecutive items and just keep the earliest Here is some sample data (courtesy of Tom Cooper) created in an Sql Server 2005 database (Column names are a bit different and additional data has been added): CREATE TABLE AssignmentLog ( TicketID [varchar] (6) NOT NULL, EventTime [smalldatetime] NOT NULL, GroupMember [varchar] (15) NOT NULL, DeptInNeed varchar(15) NOT NULL, aNumber int NOT NULL, PRIMARY KEY CLUSTERED (TicketID, EventTime) ) go INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/10/2007 15:30:00', 'Helpdesk','Mkt',14) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/10/2007 15:42:00', 'Server Admin','Mkt',43) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/10/2007 15:50:00', 'Helpdesk','ACT',134) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/10/2007 15:52:00', 'Helpdesk','Sales',60) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/10/2007 16:01:00', 'Helpdesk','Mkt',88) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/10/2007 16:22:00', 'Server Admin','Fin',245) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/11/2007 8:30:00', 'Server Admin','Per',15) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC001', '9/11/2007 9:05:00', 'Field Services','Maint',75) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/10/2007 11:30:00', 'Helpdesk','HR',0) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/10/2007 15:30:00', 'Helpdesk','Fin',24) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/11/2007 15:42:00', 'Field Services','Maint',42) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/11/2007 15:50:00', 'Field Services','HR',63) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/11/2007 15:52:00', 'Helpdesk','Sales',1) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/11/2007 16:01:00', 'Helpdesk','HR',20) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/11/2007 16:22:00', 'Helpdesk','HR',25) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/12/2007 8:30:00', 'Field Services','MKT',75) INSERT INTO AssignmentLog (TicketID, EventTime, GroupMember,DeptInNeed,aNumber) VALUES ('ABC002', '9/12/2007 9:05:00', 'Field Services','MKT',90) We now work with the AssignmentLog table in Dataphor where it is treated as a 'variable'. select AssignmentLog; TicketID EventTime GroupMember DeptInNeed aNumber -------- --------------------- -------------- ---------- ------- ABC001 9/10/2007 3:30:00 PM Helpdesk Mkt 14 ABC001 9/10/2007 3:42:00 PM Server Admin Mkt 43 ABC001 9/10/2007 3:50:00 PM Helpdesk ACT 134 ABC001 9/10/2007 3:52:00 PM Helpdesk Sales 60 ABC001 9/10/2007 4:01:00 PM Helpdesk Mkt 88 ABC001 9/10/2007 4:22:00 PM Server Admin Fin 245 ABC001 9/11/2007 8:30:00 AM Server Admin Per 15 ABC001 9/11/2007 9:05:00 AM Field Services Maint 75 ABC002 9/10/2007 11:30:00 AM Helpdesk HR 0 ABC002 9/10/2007 3:30:00 PM Helpdesk Fin 24 ABC002 9/11/2007 3:42:00 PM Field Services Maint 42 ABC002 9/11/2007 3:50:00 PM Field Services HR 63 ABC002 9/11/2007 3:52:00 PM Helpdesk Sales 1 ABC002 9/11/2007 4:01:00 PM Helpdesk HR 20 ABC002 9/11/2007 4:22:00 PM Helpdesk HR 25 ABC002 9/12/2007 8:30:00 AM Field Services MKT 75 ABC002 9/12/2007 9:05:00 AM Field Services MKT 90 Here's how the OP described his problem: Quote I just want to know when the ticket left the helpdesk to go to an outside group like Server Admins, Field Services, etc. So when there are consecutive entries within the same group, I just need the earliest one of the series. Here is the output I'm trying to generate: ticket_id event_timestamp group_assigned --------- ------------------- --------------- ABC001 09/10/2007 15:30:00 Helpdesk ABC001 09/10/2007 15:42:00 Server Admin ABC001 09/10/2007 15:50:00 Helpdesk ABC001 09/10/2007 16:22:00 Server Admin ABC001 09/11/2007 09:05:00 Field Services But how can I do that in a query? I can't simply use a GROUP BY clause, something like this: SELECT ticket_id, MIN(event_timestamp) AS event_timestamp, group_assigned FROM #AssignmentLog GROUP BY ticket_id, group_assigned ORDER BY ticket_id, MIN(event_timestamp) Because that will lump EVERYTHING to the earliest occurance of each group_assigned entry. In other words that doesn't account for the items in a consecutive series. The output looks like: ticket_id event_timestamp group_assigned --------- ------------------- --------------- ABC001 09/10/2007 15:30:00 Helpdesk ABC001 09/10/2007 15:42:00 Server Admin ABC001 09/11/2007 09:05:00 Field Services This shows the first time the helpdesk had the ticket, but doesn't show when the ticket came BACK to the helpdesk. I've tried multiple variations of sub-queries with unequal joins (<, >, etc) but keep running into similar results... can't get the query to notice consecutive entries. I would really appreciate any direction or ideas someone may have about how best to accomplish this. For reference I'm using MS SQL 2000. Close Quote This is a classic problem of constructing a dense rank for group_assigned (GroupMember) given the compound primary key (ticket_id/TicketID and event_timestamp/EventTime). When the value of GroupMember stays the same within a TicketID and EventTime (where we look at EventTime ascending within TicketID) the rank stays the same. When GroupMember changes the rank is incremented by 1, we get a new value for the rank. This behavior is that of a dense rank. Once such a rank is obtained it can be used as a grouping column and almost all problems (such as the minimum EventTime for each GroupMember within a TicketID) reduce to a simple and straightforward query. Unfortunately all to often developers are not able to simply state what 'type' of problem they're trying to solve (the fact that the OP is on S2000 and does not have ranking functions available is no excuse. Should vendors be the ultimate authority on types of problems?). The point is if there was a function library with something like the DenseRank operator available a developer should know when it is appropriate. The IDenseRank and DenseRank operators for a compound key Here are the same operators for a single primary key overloaded for a compound key where one column of the PK is a string (column PKcol2) and the other (column PKcol2) is a datetime value. If you compare the single PK operators to these you'll see there are no real drastic changes. The table parameter (aTable) is changed to accommodate the compound key and the result now includes both columns of the PK. The central idea is that only the PK, regardless of whether it's singular or compound, and the target of the rank (Grp) are the only columns that are necessary. Note the conceptually similarity to the S2008 dense_rank() function: DENSE_RANK ( ) OVER ( [ < partition_by_clause > ] < order_by_clause > ) While the partition column is clearly TicketID we cannot use the dense_rank() function where the target of the rank (GroupMember) repeats. This being due to the colossal shortsightedness of sql of not separating the target of the rank and the ordering of it. We should be able to order the target of the rank by any column(s) we choose. All sql ranking functions treat the target of the rank and the order of the target of the rank as the same column(s). See: http://beyondsql.blogspot.com/2006/09/rac-rank-this.html for more details. create operator IDenseRank (aTable:table{PKcol1:String,PKcol2:DateTime,Grp:String}): //This is the virtual table returned by IDenseRank. table{PK.col1:String,PK2Min:DateTime,PK2Max:DateTime,DenseRank:Integer} begin result:=table of typeof (result){}; var D1:= ToTable(ToList(cursor( ( ToTable(ToList(cursor( //We want to get PKs consecutively numbered (PK12Seq). // //The ToTable(ToList(cursor table order by { }))) construct is fundamentally //important both functionally and conceptually. I will cover it in detail //in future article(s). // ToTable(ToList(cursor(aTable order by {PKcol1,PKcol2} ))) {PKcol1,PKcol2,Grp,sequence+1 PK12Seq} order by {Grp,PK12Seq} ))) add{PK12Seq-(sequence+1) Drank} group by {PKcol1,Grp,Drank} add{Min(PKcol2) PK2Min,Max(PKcol2) PK2Max} ) order by {PKcol1,PK2Min} ))) {PKcol1,PK2Min,PK2Max,sequence+1 DenseRank}; // //We want to renumber the dense rank from 1 to N for each (ie,PKcol1/groupID) // result:= D1 join ( D1 group by {PKcol1} add {Min(DenseRank) OffSet} {PKcol1,OffSet-1 OffSet} ) {PKcol1 PK.col1,PK2Min,PK2Max,DenseRank-OffSet DenseRank} ; end; create operator DenseRank(aTable:table{PKcol1:String,PKcol2:DateTime,Grp:String}): //This is the virtual table that the operator returns. table{PKcol1:String,PKcol2:DateTime,DenseRank:Integer} begin result:= table of typeof (result){}; result:= aTable {PKcol1,PKcol2} times //times is like an sql cross join. IDenseRank(aTable) //The virtual result of the IDenseRank operator is used. //Note the where statement is changed from the single PK operator. We include //the condition PKcol1=PK.col1 so we get the right PKcol2 values for //the correct PKcol1 values (groups). where (PKcol1=PK.col1) and (PKcol2 between PK2Min and PK2Max) {PKcol1,PKcol2,DenseRank}; end; We now follow the same methodology for a single PK. The AssignmentLog table simply must be the same 'type' as the aTable parameter. That means the column names and their data type must match those of the aTable parameter. All we have to do is rename the appropriate AssignmentLog columns, ie. TicketID to PKcol1, EventTime to PKcol2 and GroupMember to Grp. select DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}); PKcol1 PKcol2 DenseRank ------ --------------------- --------- ABC001 9/10/2007 3:30:00 PM 1 ABC001 9/10/2007 3:42:00 PM 2 ABC001 9/10/2007 3:50:00 PM 3 ABC001 9/10/2007 3:52:00 PM 3 ABC001 9/10/2007 4:01:00 PM 3 ABC001 9/10/2007 4:22:00 PM 4 ABC001 9/11/2007 8:30:00 AM 4 ABC001 9/11/2007 9:05:00 AM 5 ABC002 9/10/2007 11:30:00 AM 1 ABC002 9/10/2007 3:30:00 PM 1 ABC002 9/11/2007 3:42:00 PM 2 ABC002 9/11/2007 3:50:00 PM 2 ABC002 9/11/2007 3:52:00 PM 3 ABC002 9/11/2007 4:01:00 PM 3 ABC002 9/11/2007 4:22:00 PM 3 ABC002 9/12/2007 8:30:00 AM 4 ABC002 9/12/2007 9:05:00 AM 4 We get all the data from the AssignmentLog table and the DenseRank operator with a simple join. We rename the columns of the result table from the DenseRank operator to the orginal column names (of the AssignmentLog table). (We can rename columns of 'any' table since all tables are 'variables' ☺ . As you can see the ranks are consecutively numbered starting from 1 within each TicketID. select ( DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) {PKcol1 TicketID,PKcol2 EventTime,DenseRank} ) join //This is a natural join (match on common columns, ie. TicketID and EventTime. AssignmentLog {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank} order by {TicketID,EventTime} ; TicketID EventTime GroupMember DeptInNeed aNumber DenseRank -------- --------------------- -------------- ---------- ------- --------- ABC001 9/10/2007 3:30:00 PM Helpdesk Mkt 14 1 ABC001 9/10/2007 3:42:00 PM Server Admin Mkt 43 2 ABC001 9/10/2007 3:50:00 PM Helpdesk ACT 134 3 ABC001 9/10/2007 3:52:00 PM Helpdesk Sales 60 3 ABC001 9/10/2007 4:01:00 PM Helpdesk Mkt 88 3 ABC001 9/10/2007 4:22:00 PM Server Admin Fin 245 4 ABC001 9/11/2007 8:30:00 AM Server Admin Per 15 4 ABC001 9/11/2007 9:05:00 AM Field Services Maint 75 5 ABC002 9/10/2007 11:30:00 AM Helpdesk HR 0 1 ABC002 9/10/2007 3:30:00 PM Helpdesk Fin 24 1 ABC002 9/11/2007 3:42:00 PM Field Services Maint 42 2 ABC002 9/11/2007 3:50:00 PM Field Services HR 63 2 ABC002 9/11/2007 3:52:00 PM Helpdesk Sales 1 3 ABC002 9/11/2007 4:01:00 PM Helpdesk HR 20 3 ABC002 9/11/2007 4:22:00 PM Helpdesk HR 25 3 ABC002 9/12/2007 8:30:00 AM Field Services MKT 75 4 ABC002 9/12/2007 9:05:00 AM Field Services MKT 90 4 With the inclusion of the DenseRank column we can write relatively simple queries to answer questions concerning the grouping of TicketID and GroupMember. We can easily answer the OP original question: 'So when there are consecutive entries within the same group, I just need the earliest one of the series.' This is simply grouping by TicketID,GroupMember and DenseRank and getting the minimum EventTime. select ( ( DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) {PKcol1 TicketID,PKcol2 EventTime,DenseRank} ) join AssignmentLog {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank} ) group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime} order by {TicketID,DenseRank}; TicketID GroupMember DenseRank MinTime -------- -------------- --------- --------------------- ABC001 Helpdesk 1 9/10/2007 3:30:00 PM ABC001 Server Admin 2 9/10/2007 3:42:00 PM ABC001 Helpdesk 3 9/10/2007 3:50:00 PM ABC001 Server Admin 4 9/10/2007 4:22:00 PM ABC001 Field Services 5 9/11/2007 9:05:00 AM ABC002 Helpdesk 1 9/10/2007 11:30:00 AM ABC002 Field Services 2 9/11/2007 3:42:00 PM ABC002 Helpdesk 3 9/11/2007 3:52:00 PM ABC002 Field Services 4 9/12/2007 8:30:00 AM Of course we can remove DenseRank from the result. select ( ( DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) {PKcol1 TicketID,PKcol2 EventTime,DenseRank} ) join AssignmentLog {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank} ) group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime} remove{DenseRank} order by {TicketID,MinTime}; TicketID GroupMember MinTime -------- -------------- --------------------- ABC001 Helpdesk 9/10/2007 3:30:00 PM ABC001 Server Admin 9/10/2007 3:42:00 PM ABC001 Helpdesk 9/10/2007 3:50:00 PM ABC001 Server Admin 9/10/2007 4:22:00 PM ABC001 Field Services 9/11/2007 9:05:00 AM ABC002 Helpdesk 9/10/2007 11:30:00 AM ABC002 Field Services 9/11/2007 3:42:00 PM ABC002 Helpdesk 9/11/2007 3:52:00 PM ABC002 Field Services 9/12/2007 8:30:00 AM And we can do pretty much anything we want quite easily. Here we show a concatenated list of Depts (DeptInNeed) for each grouping. select ( ( DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) {PKcol1 TicketID,PKcol2 EventTime,DenseRank} ) join AssignmentLog {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank,',' Del} adorn {key{TicketID,EventTime}} ) group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime, Concat(DeptInNeed,Del order by {TicketID,EventTime}) AllDepts} order by {TicketID,DenseRank}; TicketID GroupMember DenseRank MinTime AllDepts -------- -------------- --------- --------------------- ------------- ABC001 Helpdesk 1 9/10/2007 3:30:00 PM Mkt ABC001 Server Admin 2 9/10/2007 3:42:00 PM Mkt ABC001 Helpdesk 3 9/10/2007 3:50:00 PM ACT,Sales,Mkt ABC001 Server Admin 4 9/10/2007 4:22:00 PM Fin,Per ABC001 Field Services 5 9/11/2007 9:05:00 AM Maint ABC002 Helpdesk 1 9/10/2007 11:30:00 AM HR,Fin ABC002 Field Services 2 9/11/2007 3:42:00 PM Maint,HR ABC002 Helpdesk 3 9/11/2007 3:52:00 PM Sales,HR,HR ABC002 Field Services 4 9/12/2007 8:30:00 AM MKT,MKT Or show distinct Depts for each grouping. (If you don't understood some of this stuff ask ☺ Everything in due time ☺ . var D1:= //D1 is somewhat analogous to an sql CTE. But it's a 'variable'! ( DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) {PKcol1 TicketID,PKcol2 EventTime,DenseRank} ) join AssignmentLog {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank,',' Del}; select ( D1 group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime} ) join ( (D1 {TicketID,GroupMember,DenseRank,DeptInNeed,Del} adorn{key{DeptInNeed}}) group by {TicketID,GroupMember,DenseRank} add{ Concat( distinct DeptInNeed,Del order by {DeptInNeed}) DistinctDepts} ) order by {TicketID,DenseRank}; TicketID GroupMember DenseRank MinTime DistinctDepts -------- -------------- --------- --------------------- ------------- ABC001 Helpdesk 1 9/10/2007 3:30:00 PM Mkt ABC001 Server Admin 2 9/10/2007 3:42:00 PM Mkt ABC001 Helpdesk 3 9/10/2007 3:50:00 PM ACT,Mkt,Sales ABC001 Server Admin 4 9/10/2007 4:22:00 PM Fin,Per ABC001 Field Services 5 9/11/2007 9:05:00 AM Maint ABC002 Helpdesk 1 9/10/2007 11:30:00 AM Fin,HR ABC002 Field Services 2 9/11/2007 3:42:00 PM HR,Maint ABC002 Helpdesk 3 9/11/2007 3:52:00 PM HR,Sales ABC002 Field Services 4 9/12/2007 8:30:00 AM MKT Another example of a dense rank with a compound primary key This example is based on the post: comp.databases.ms-sqlserver Sep 5, 7:59 am Grouping similar rows http://tinyurl.com/3378lk Here is the create table statement and some rows entered into an Sql Server 2005 database (courtesy of MVP Hugo Kornelis). CREATE TABLE TheInput (Letter char(1) NOT NULL, RowNum int NOT NULL, OtherValue int NOT NULL, PRIMARY KEY (Letter, RowNum)); go INSERT INTO TheInput (Letter, RowNum, OtherValue) SELECT 'A', 1, 50 UNION ALL SELECT 'A', 2, 50 UNION ALL SELECT 'A', 3, 100 UNION ALL SELECT 'A', 4, 50 UNION ALL SELECT 'A', 5, 100 UNION ALL SELECT 'A', 6, 100; INSERT INTO TheInput (Letter, RowNum, OtherValue) SELECT 'B', 1, 50 UNION ALL SELECT 'B', 2, 100 UNION ALL SELECT 'B', 3, 50 UNION ALL SELECT 'B', 4, 50 UNION ALL SELECT 'B', 5, 100 UNION ALL SELECT 'B', 6, 100; We now work with the table in Dataphor. select TheInput; Letter RowNum OtherValue ------ ------ ---------- A 1 50 A 2 50 A 3 100 A 4 50 A 5 100 A 6 100 B 1 50 B 2 100 B 3 50 B 4 50 B 5 100 B 6 100 The OP poses his problem as: Quote I would like to group rows based on order and similarity in a single sql-query if possible: A, 1, 50 A, 2, 50 A, 3, 100 A, 4, 50 A, 5, 100 A, 6, 100 Would come out as: A, 1, 50, 2 <-- Last value shows number of grouped rows A, 2, 100, 1 A, 3, 50, 1 A, 4, 100, 2 UnQuote Like the previous example, there is no mention in either the question or the solution of a dense rank, yet that is what the problem calls for, ie. a dense rank over OtherValue within each Letter and in the ascending order of RowNum. We need only check the compound key version of the DenseRank and IDenseRank operators to insure they accept a table as a parameter of the same 'type' as TheInput table. The current compound DenseRank operator accepts a table of type: table{PKcol1:String,PKcol2:DateTime,Grp:String} While the Letter column of the TheInput table is the same data type as PKcol1 (String), the PKcol2 data type (DateTime) is different than the RowNum column type (Integer). In addition the OtherValue column (Integer), the target of the rank, is different than the Grp type (String). So all we have to do is again overload the compound DenseRank operators to accept the compound key and Grp data types of the TheInput table. It's that simple ☺ . Here are the operators overloaded to accept a table of type: table{PKcol1:String,PKcol2:Integer,Grp:Integer} create operator IDenseRank (aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}): table{PK.col1:String,PK2Min:Integer,PK2Max:Integer,DenseRank:Integer} begin result:=table of typeof (result){}; var D1:= ToTable(ToList(cursor( ( ToTable(ToList(cursor( ToTable(ToList(cursor(aTable order by {PKcol1,PKcol2} ))) {PKcol1,PKcol2,Grp,sequence+1 PK12Seq} order by {Grp,PK12Seq} ))) add{PK12Seq-(sequence+1) Drank} group by {PKcol1,Grp,Drank} add{Min(PKcol2) PK2Min,Max(PKcol2) PK2Max} ) order by {PKcol1,PK2Min} ))) {PKcol1,PK2Min,PK2Max,sequence+1 DenseRank}; result:= D1 join ( D1 group by {PKcol1} add {Min(DenseRank) OffSet} {PKcol1,OffSet-1 OffSet} ) {PKcol1 PK.col1,PK2Min,PK2Max,DenseRank-OffSet DenseRank} ; end; create operator DenseRank(aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}): table{PKcol1:String,PKcol2:Integer,DenseRank:Integer} begin result:= table of typeof (result){}; result:= aTable {PKcol1,PKcol2} times IDenseRank(aTable) where (PKcol1=PK.col1) and (PKcol2 between PK2Min and PK2Max) {PKcol1,PKcol2,DenseRank}; end; Now we can use the compound dense rank for a table of type TheInput. select DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp}) ; PKcol1 PKcol2 DenseRank ------ ------ --------- A 1 1 A 2 1 A 3 2 A 4 3 A 5 4 A 6 4 B 1 1 B 2 2 B 3 3 B 4 3 B 5 4 B 6 4 select ( DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp}) {PKcol1 Letter,PKcol2 RowNum,DenseRank} ) join TheInput order by {Letter,RowNum} ; Letter RowNum DenseRank OtherValue ------ ------ --------- ---------- A 1 1 50 A 2 1 50 A 3 2 100 A 4 3 50 A 5 4 100 A 6 4 100 B 1 1 50 B 2 2 100 B 3 3 50 B 4 3 50 B 5 4 100 B 6 4 100 A simple group by with the Count aggregate answers the OP question. select ( DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp}) {PKcol1 Letter,PKcol2 RowNum,DenseRank} ) join TheInput group by {Letter,DenseRank,OtherValue} add{Count() Cnt} order by {Letter,DenseRank}; Letter DenseRank OtherValue Cnt ------ --------- ---------- --- A 1 50 2 A 2 100 1 A 3 50 1 A 4 100 2 B 1 50 1 B 2 100 1 B 3 50 2 B 4 100 2 Summary The parameter declaration of an operator, ie. (aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}) is called the 'signature' of the operator. Dataphor stores the same name of an operator for each different signature. In other words, the same operator name for different parameter types (ie. different table types). This is the concept of overloading. One operator name covering different types (regardless of the returned result or even if the code differs for different signatures). This, obviously, significantly benefits developers. For some types of complicated problems just one clever programmer is all it takes to write the logic of the operator. To use the operator other developers merely insure that the signature (parameter type(s)) match their own tables. If not, simply change the signature (and possibly the data types of the result). It would not be difficult to overload the dense rank operators to allow for compound keys of more than two columns. Even the coding change for more than two PK columns is simple. While overloaded operators are physically stored multiple times (based on signiture), conceptually there is but one dense rank operator. When the operator is invoked it simply looks for the signature (type) that matches the table being sent in as a parameter. Just think of the many types of problems typically encountered. Many of these can follow the paradigm of the dense rank. Imagine a library of these type of operators accessible from any database(s). (Feel free to contact me if you would like to see a specific example of a type of problem like the dense rank problem). A final caveat These operators could be coded in a more sophisticated fashion. But before the fancy coding, before the modelling of a business problem, before anything in the database, it all rests on understanding the key concepts of a relational database like Dataphor. And concepts like types and variables are just as important as clever code, if not more so ☺ .
No comments:
Post a Comment