Beyond Sql: Dataphor - Super Function II

This article extends the functionality of the Dense Rank function
described in:
http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html

For background on the concepts involved see:
http://beyondsql.blogspot.com/2007/06/dataphor-13-passing-table-as-parameter.html
http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html
http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html 
Sql Server 2005 Bol, Ranking Functions:
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm

The original DenseRank operator was based on a table with a single
primary key. We now extend the usability of the operator by overloading
it to return a dense rank for a compound primary key, ie. a primary key 
consisting of two columns.

As an example consider the problem posed in the thread:

microsoft.public.sqlserver.programming
Thursday, September 13, 2007 2:27 PM
SQL Query Help: Need to remove consecutive items and just keep the earliest

Here is some sample data (courtesy of Tom Cooper) created in an Sql Server
2005 database (Column names are a bit different and additional data has
been added):

CREATE TABLE AssignmentLog (
    TicketID [varchar] (6) NOT NULL,
    EventTime [smalldatetime] NOT NULL,
    GroupMember [varchar] (15) NOT NULL,
    DeptInNeed varchar(15) NOT NULL,
    aNumber int NOT NULL, 
  PRIMARY KEY CLUSTERED (TicketID, EventTime)
)
go
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:30:00', 'Helpdesk','Mkt',14)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:42:00', 'Server Admin','Mkt',43)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:50:00', 'Helpdesk','ACT',134)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:52:00', 'Helpdesk','Sales',60)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 16:01:00', 'Helpdesk','Mkt',88)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 16:22:00', 'Server Admin','Fin',245)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/11/2007 8:30:00', 'Server Admin','Per',15)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/11/2007 9:05:00', 'Field Services','Maint',75)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/10/2007 11:30:00', 'Helpdesk','HR',0)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/10/2007 15:30:00', 'Helpdesk','Fin',24)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 15:42:00', 'Field Services','Maint',42)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 15:50:00', 'Field Services','HR',63)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 15:52:00', 'Helpdesk','Sales',1)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 16:01:00', 'Helpdesk','HR',20)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 16:22:00', 'Helpdesk','HR',25)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/12/2007 8:30:00', 'Field Services','MKT',75)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/12/2007 9:05:00', 'Field Services','MKT',90)

We now work with the AssignmentLog table in Dataphor where it is
treated as a 'variable'.

select AssignmentLog;    

TicketID EventTime             GroupMember    DeptInNeed aNumber 
-------- --------------------- -------------- ---------- ------- 
ABC001   9/10/2007 3:30:00 PM  Helpdesk       Mkt        14      
ABC001   9/10/2007 3:42:00 PM  Server Admin   Mkt        43      
ABC001   9/10/2007 3:50:00 PM  Helpdesk       ACT        134     
ABC001   9/10/2007 3:52:00 PM  Helpdesk       Sales      60      
ABC001   9/10/2007 4:01:00 PM  Helpdesk       Mkt        88      
ABC001   9/10/2007 4:22:00 PM  Server Admin   Fin        245     
ABC001   9/11/2007 8:30:00 AM  Server Admin   Per        15      
ABC001   9/11/2007 9:05:00 AM  Field Services Maint      75      
ABC002   9/10/2007 11:30:00 AM Helpdesk       HR         0       
ABC002   9/10/2007 3:30:00 PM  Helpdesk       Fin        24      
ABC002   9/11/2007 3:42:00 PM  Field Services Maint      42      
ABC002   9/11/2007 3:50:00 PM  Field Services HR         63      
ABC002   9/11/2007 3:52:00 PM  Helpdesk       Sales      1       
ABC002   9/11/2007 4:01:00 PM  Helpdesk       HR         20      
ABC002   9/11/2007 4:22:00 PM  Helpdesk       HR         25      
ABC002   9/12/2007 8:30:00 AM  Field Services MKT        75      
ABC002   9/12/2007 9:05:00 AM  Field Services MKT        90      

Here's how the OP described his problem:
Quote
I just want to know when the ticket left the helpdesk to go to an 
outside group like Server Admins, Field Services, etc.  So when there
are consecutive entries within the same group, I just need the earliest
one of the series.  Here is the output I'm trying to generate:

ticket_id event_timestamp     group_assigned
--------- ------------------- ---------------
ABC001    09/10/2007 15:30:00 Helpdesk
ABC001    09/10/2007 15:42:00 Server Admin
ABC001    09/10/2007 15:50:00 Helpdesk
ABC001    09/10/2007 16:22:00 Server Admin
ABC001    09/11/2007 09:05:00 Field Services

But how can I do that in a query?  I can't simply use a GROUP BY
clause, something like this:

SELECT ticket_id, MIN(event_timestamp) AS event_timestamp,
group_assigned
FROM #AssignmentLog
GROUP BY ticket_id, group_assigned
ORDER BY ticket_id, MIN(event_timestamp)

Because that will lump EVERYTHING to the earliest occurance of each
group_assigned entry.  In other words that doesn't account for the
items in a consecutive series.  The output looks like:

ticket_id event_timestamp     group_assigned
--------- ------------------- ---------------
ABC001    09/10/2007 15:30:00 Helpdesk
ABC001    09/10/2007 15:42:00 Server Admin
ABC001    09/11/2007 09:05:00 Field Services

This shows the first time the helpdesk had the ticket, but doesn't
show when the ticket came BACK to the helpdesk.

I've tried multiple variations of sub-queries with unequal joins (<,
>, etc) but keep running into similar results... can't get the query
to notice consecutive entries.

I would really appreciate any direction or ideas someone may have
about how best to accomplish this.  For reference I'm using MS SQL
2000.
Close Quote

This is a classic problem of constructing a dense rank for group_assigned
(GroupMember) given the compound primary key (ticket_id/TicketID and
event_timestamp/EventTime). When the value of GroupMember stays the same
within a TicketID and EventTime (where we look at EventTime ascending
within TicketID) the rank stays the same. When GroupMember changes
the rank is incremented by 1, we get a new value for the rank. This
behavior is that of a dense rank. Once such a rank is obtained it
can be used as a grouping column and almost all problems (such as the
minimum EventTime for each GroupMember within a TicketID) reduce to 
a simple and straightforward query.
Unfortunately all to often developers are not able to simply state
what 'type' of problem they're trying to solve (the fact that the OP
is on S2000 and does not have ranking functions available is no
excuse. Should vendors be the ultimate authority on types of problems?).
The point is if there was a function library with something like
the DenseRank operator available a developer should know when it is 
appropriate.

The IDenseRank and DenseRank operators for a compound key

Here are the same operators for a single primary key overloaded for a
compound key where one column of the PK is a string (column PKcol2)
and the other (column PKcol2) is a datetime value. If you compare
the single PK operators to these you'll see there are no real drastic
changes. The table parameter (aTable) is changed to accommodate the
compound key and the result now includes both columns of the PK.
The central idea is that only the PK, regardless of whether it's
singular or compound, and the target of the rank (Grp) are the only
columns that are necessary. Note the conceptually similarity to
the S2008 dense_rank() function:

DENSE_RANK ( ) OVER ( [ < partition_by_clause > ] < order_by_clause > )

While the partition column is clearly TicketID we cannot use the
dense_rank() function where the target of the rank (GroupMember) repeats.
This being due to the colossal shortsightedness of sql of not separating
the target of the rank and the ordering of it. We should be able to order
the target of the rank by any column(s) we choose. All sql ranking functions
treat the target of the rank and the order of the target of the rank
as the same column(s). See:
http://beyondsql.blogspot.com/2006/09/rac-rank-this.html
for more details.

create operator IDenseRank
 (aTable:table{PKcol1:String,PKcol2:DateTime,Grp:String}):
     //This is the virtual table returned by IDenseRank.
        table{PK.col1:String,PK2Min:DateTime,PK2Max:DateTime,DenseRank:Integer}
begin
result:=table of typeof (result){};            
var D1:=
ToTable(ToList(cursor(
    (
 ToTable(ToList(cursor(
//We want to get PKs consecutively numbered (PK12Seq).
//
//The ToTable(ToList(cursor table order by { }))) construct is fundamentally
//important both functionally and conceptually. I will cover it in detail
//in future article(s).
//
 ToTable(ToList(cursor(aTable order by {PKcol1,PKcol2} )))
                            {PKcol1,PKcol2,Grp,sequence+1 PK12Seq}
     order by {Grp,PK12Seq}  )))  
      add{PK12Seq-(sequence+1) Drank}
       group by {PKcol1,Grp,Drank} add{Min(PKcol2) PK2Min,Max(PKcol2) PK2Max}
      )      
                 order by {PKcol1,PK2Min}  )))  
                   {PKcol1,PK2Min,PK2Max,sequence+1 DenseRank};
//                   
//We want to renumber the dense rank from 1 to N for each (ie,PKcol1/groupID)             
//
result:= 
    D1
     join
      (
        D1 group by {PKcol1} add {Min(DenseRank) OffSet} {PKcol1,OffSet-1 OffSet}
      )                          
        {PKcol1 PK.col1,PK2Min,PK2Max,DenseRank-OffSet DenseRank} ;
end;

create operator DenseRank(aTable:table{PKcol1:String,PKcol2:DateTime,Grp:String}):
            //This is the virtual table that the operator returns.
                         table{PKcol1:String,PKcol2:DateTime,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
   aTable {PKcol1,PKcol2}
    times               //times is like an sql cross join.
     IDenseRank(aTable) //The virtual result of the IDenseRank operator is used.
//Note the where statement is changed from the single PK operator. We include
//the condition PKcol1=PK.col1 so we get the right PKcol2 values for
//the correct PKcol1 values (groups).
      where (PKcol1=PK.col1) and (PKcol2 between PK2Min and PK2Max)
            {PKcol1,PKcol2,DenseRank};       
end;

We now follow the same methodology for a single PK. The AssignmentLog table
simply must be the same 'type' as the aTable parameter. That means the
column names and their data type must match those of the aTable parameter. 
All we have to do is rename the appropriate AssignmentLog columns, 
ie. TicketID to PKcol1, EventTime to PKcol2 and GroupMember to Grp.

select 
 DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp});     

PKcol1 PKcol2                DenseRank 
------ --------------------- --------- 
ABC001 9/10/2007 3:30:00 PM  1         
ABC001 9/10/2007 3:42:00 PM  2         
ABC001 9/10/2007 3:50:00 PM  3         
ABC001 9/10/2007 3:52:00 PM  3         
ABC001 9/10/2007 4:01:00 PM  3         
ABC001 9/10/2007 4:22:00 PM  4         
ABC001 9/11/2007 8:30:00 AM  4         
ABC001 9/11/2007 9:05:00 AM  5         
ABC002 9/10/2007 11:30:00 AM 1         
ABC002 9/10/2007 3:30:00 PM  1         
ABC002 9/11/2007 3:42:00 PM  2         
ABC002 9/11/2007 3:50:00 PM  2         
ABC002 9/11/2007 3:52:00 PM  3         
ABC002 9/11/2007 4:01:00 PM  3         
ABC002 9/11/2007 4:22:00 PM  3         
ABC002 9/12/2007 8:30:00 AM  4         
ABC002 9/12/2007 9:05:00 AM  4         

We get all the data from the AssignmentLog table and the DenseRank operator
with a simple join. We rename the columns of the result table from the 
DenseRank operator to the orginal column names (of the AssignmentLog table).
(We can rename columns of 'any' table since all tables are 'variables' ☺ .
As you can see the ranks are consecutively numbered starting from 1 within
each TicketID.

select 
(
 DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
)   
 join //This is a natural join (match on common columns, ie. TicketID and EventTime.  
    AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank}
        order by {TicketID,EventTime} ;


TicketID EventTime             GroupMember    DeptInNeed aNumber DenseRank 
-------- --------------------- -------------- ---------- ------- --------- 
ABC001   9/10/2007 3:30:00 PM  Helpdesk       Mkt        14      1         
ABC001   9/10/2007 3:42:00 PM  Server Admin   Mkt        43      2         
ABC001   9/10/2007 3:50:00 PM  Helpdesk       ACT        134     3         
ABC001   9/10/2007 3:52:00 PM  Helpdesk       Sales      60      3         
ABC001   9/10/2007 4:01:00 PM  Helpdesk       Mkt        88      3         
ABC001   9/10/2007 4:22:00 PM  Server Admin   Fin        245     4         
ABC001   9/11/2007 8:30:00 AM  Server Admin   Per        15      4         
ABC001   9/11/2007 9:05:00 AM  Field Services Maint      75      5         
ABC002   9/10/2007 11:30:00 AM Helpdesk       HR         0       1         
ABC002   9/10/2007 3:30:00 PM  Helpdesk       Fin        24      1         
ABC002   9/11/2007 3:42:00 PM  Field Services Maint      42      2         
ABC002   9/11/2007 3:50:00 PM  Field Services HR         63      2         
ABC002   9/11/2007 3:52:00 PM  Helpdesk       Sales      1       3         
ABC002   9/11/2007 4:01:00 PM  Helpdesk       HR         20      3         
ABC002   9/11/2007 4:22:00 PM  Helpdesk       HR         25      3         
ABC002   9/12/2007 8:30:00 AM  Field Services MKT        75      4         
ABC002   9/12/2007 9:05:00 AM  Field Services MKT        90      4         

With the inclusion of the DenseRank column we can write relatively simple
queries to answer questions concerning the grouping of TicketID and GroupMember.

We can easily answer the OP original question:
'So when there are consecutive entries within the same group, I just need
 the earliest one of the series.'

This is simply grouping by TicketID,GroupMember and DenseRank and getting
the minimum EventTime.

select
 ( 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank}
  )
    group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime}
                 order by {TicketID,DenseRank}; 

TicketID GroupMember    DenseRank MinTime               
-------- -------------- --------- --------------------- 
ABC001   Helpdesk       1         9/10/2007 3:30:00 PM  
ABC001   Server Admin   2         9/10/2007 3:42:00 PM  
ABC001   Helpdesk       3         9/10/2007 3:50:00 PM  
ABC001   Server Admin   4         9/10/2007 4:22:00 PM  
ABC001   Field Services 5         9/11/2007 9:05:00 AM  
ABC002   Helpdesk       1         9/10/2007 11:30:00 AM 
ABC002   Field Services 2         9/11/2007 3:42:00 PM  
ABC002   Helpdesk       3         9/11/2007 3:52:00 PM  
ABC002   Field Services 4         9/12/2007 8:30:00 AM  

Of course we can remove DenseRank from the result.

select
 ( 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank}
  )
    group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime}
             remove{DenseRank}
                 order by {TicketID,MinTime}; 
                 
TicketID GroupMember    MinTime               
-------- -------------- --------------------- 
ABC001   Helpdesk       9/10/2007 3:30:00 PM  
ABC001   Server Admin   9/10/2007 3:42:00 PM  
ABC001   Helpdesk       9/10/2007 3:50:00 PM  
ABC001   Server Admin   9/10/2007 4:22:00 PM  
ABC001   Field Services 9/11/2007 9:05:00 AM  
ABC002   Helpdesk       9/10/2007 11:30:00 AM 
ABC002   Field Services 9/11/2007 3:42:00 PM  
ABC002   Helpdesk       9/11/2007 3:52:00 PM  
ABC002   Field Services 9/12/2007 8:30:00 AM  

And we can do pretty much anything we want quite easily. Here we
show a concatenated list of Depts (DeptInNeed) for each grouping.

select
 ( 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank,',' Del}
   adorn {key{TicketID,EventTime}}
 )  
    group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime,
             Concat(DeptInNeed,Del order by {TicketID,EventTime}) AllDepts}
                 order by {TicketID,DenseRank};  
                 
TicketID GroupMember    DenseRank MinTime               AllDepts      
-------- -------------- --------- --------------------- ------------- 
ABC001   Helpdesk       1         9/10/2007 3:30:00 PM  Mkt           
ABC001   Server Admin   2         9/10/2007 3:42:00 PM  Mkt           
ABC001   Helpdesk       3         9/10/2007 3:50:00 PM  ACT,Sales,Mkt 
ABC001   Server Admin   4         9/10/2007 4:22:00 PM  Fin,Per       
ABC001   Field Services 5         9/11/2007 9:05:00 AM  Maint         
ABC002   Helpdesk       1         9/10/2007 11:30:00 AM HR,Fin        
ABC002   Field Services 2         9/11/2007 3:42:00 PM  Maint,HR      
ABC002   Helpdesk       3         9/11/2007 3:52:00 PM  Sales,HR,HR   
ABC002   Field Services 4         9/12/2007 8:30:00 AM  MKT,MKT      

Or show distinct Depts for each grouping.
(If you don't understood some of this stuff ask ☺ Everything in due time ☺ . 

var D1:= //D1 is somewhat analogous to an sql CTE. But it's a 'variable'! 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank,',' Del};
 
select 
  (
    D1 group by {TicketID,GroupMember,DenseRank}
    add{Min(EventTime) MinTime}
  )  
   join
    (
      (D1 {TicketID,GroupMember,DenseRank,DeptInNeed,Del} adorn{key{DeptInNeed}})
        group by {TicketID,GroupMember,DenseRank}
          add{ Concat( distinct DeptInNeed,Del order by {DeptInNeed}) DistinctDepts}
     ) 
                 order by {TicketID,DenseRank};   
                 
TicketID GroupMember    DenseRank MinTime               DistinctDepts 
-------- -------------- --------- --------------------- ------------- 
ABC001   Helpdesk       1         9/10/2007 3:30:00 PM  Mkt           
ABC001   Server Admin   2         9/10/2007 3:42:00 PM  Mkt           
ABC001   Helpdesk       3         9/10/2007 3:50:00 PM  ACT,Mkt,Sales 
ABC001   Server Admin   4         9/10/2007 4:22:00 PM  Fin,Per       
ABC001   Field Services 5         9/11/2007 9:05:00 AM  Maint         
ABC002   Helpdesk       1         9/10/2007 11:30:00 AM Fin,HR        
ABC002   Field Services 2         9/11/2007 3:42:00 PM  HR,Maint      
ABC002   Helpdesk       3         9/11/2007 3:52:00 PM  HR,Sales      
ABC002   Field Services 4         9/12/2007 8:30:00 AM  MKT           


Another example of a dense rank with a compound primary key

This example is based on the post:

comp.databases.ms-sqlserver
Sep 5, 7:59 am
Grouping similar rows
http://tinyurl.com/3378lk

Here is the create table statement and some rows entered into
an Sql Server 2005 database (courtesy of MVP Hugo Kornelis).

CREATE TABLE TheInput
     (Letter char(1) NOT NULL,
      RowNum int NOT NULL,
      OtherValue int NOT NULL,
      PRIMARY KEY (Letter, RowNum));
go
INSERT INTO TheInput (Letter, RowNum, OtherValue)
SELECT 'A', 1, 50
UNION ALL
SELECT 'A', 2, 50
UNION ALL
SELECT 'A', 3, 100
UNION ALL
SELECT 'A', 4, 50
UNION ALL
SELECT 'A', 5, 100
UNION ALL
SELECT 'A', 6, 100;
INSERT INTO TheInput (Letter, RowNum, OtherValue)
SELECT 'B', 1, 50
UNION ALL
SELECT 'B', 2, 100
UNION ALL
SELECT 'B', 3, 50
UNION ALL
SELECT 'B', 4, 50
UNION ALL
SELECT 'B', 5, 100
UNION ALL
SELECT 'B', 6, 100; 

We now work with the table in Dataphor.

select TheInput; 

Letter RowNum OtherValue 
------ ------ ---------- 
A      1      50         
A      2      50         
A      3      100        
A      4      50         
A      5      100        
A      6      100        
B      1      50         
B      2      100        
B      3      50         
B      4      50         
B      5      100        
B      6      100     

The OP poses his problem as:

Quote
I would like to group rows based on order and similarity in a single
sql-query if possible:

A, 1, 50
A, 2, 50
A, 3, 100
A, 4, 50
A, 5, 100
A, 6, 100

Would come out as:

A, 1, 50, 2 <-- Last value shows number of grouped rows
A, 2, 100, 1
A, 3, 50, 1
A, 4, 100, 2 
UnQuote

Like the previous example, there is no mention in either the question
or the solution of a dense rank, yet that is what the problem calls for,
ie. a dense rank over OtherValue within each Letter and in the ascending
order of RowNum.

We need only check the compound key version of the DenseRank and
IDenseRank operators to insure they accept a table as a parameter
of the same 'type' as TheInput table.

The current compound DenseRank operator accepts a table of type:

table{PKcol1:String,PKcol2:DateTime,Grp:String}

While the Letter column of the TheInput table is the same data type as
PKcol1 (String), the PKcol2 data type (DateTime) is different than the
RowNum column type (Integer). In addition the OtherValue column (Integer),
the target of the rank, is different than the Grp type (String). So all
we have to do is again overload the compound DenseRank operators to accept
the compound key and Grp data types of the TheInput table. It's that simple ☺ .

Here are the operators overloaded to accept a table of type:

table{PKcol1:String,PKcol2:Integer,Grp:Integer}

create operator IDenseRank
 (aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}):
       table{PK.col1:String,PK2Min:Integer,PK2Max:Integer,DenseRank:Integer}
begin
result:=table of typeof (result){};            
var D1:=
ToTable(ToList(cursor(
    (
 ToTable(ToList(cursor(
 ToTable(ToList(cursor(aTable order by {PKcol1,PKcol2} )))
                            {PKcol1,PKcol2,Grp,sequence+1 PK12Seq}
     order by {Grp,PK12Seq}  )))  
      add{PK12Seq-(sequence+1) Drank}
       group by {PKcol1,Grp,Drank} add{Min(PKcol2) PK2Min,Max(PKcol2) PK2Max}
      )      
                 order by {PKcol1,PK2Min}  )))  
                   {PKcol1,PK2Min,PK2Max,sequence+1 DenseRank};
result:= 
    D1
     join
      (
        D1 group by {PKcol1} add {Min(DenseRank) OffSet} {PKcol1,OffSet-1 OffSet}
      )                          
        {PKcol1 PK.col1,PK2Min,PK2Max,DenseRank-OffSet DenseRank} ;
end;

create operator DenseRank(aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}):
                     table{PKcol1:String,PKcol2:Integer,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
   aTable {PKcol1,PKcol2}
    times
     IDenseRank(aTable)
      where (PKcol1=PK.col1) and (PKcol2 between PK2Min and PK2Max)
            {PKcol1,PKcol2,DenseRank};       
end;

Now we can use the compound dense rank for a table of type TheInput.

select 
 DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp}) ;
 
PKcol1 PKcol2 DenseRank 
------ ------ --------- 
A      1      1         
A      2      1         
A      3      2         
A      4      3         
A      5      4         
A      6      4         
B      1      1         
B      2      2         
B      3      3         
B      4      3         
B      5      4         
B      6      4     

select 
 (
  DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp})
     {PKcol1 Letter,PKcol2 RowNum,DenseRank}
 )   
  join TheInput
        order by {Letter,RowNum} ; 
        
Letter RowNum DenseRank OtherValue 
------ ------ --------- ---------- 
A      1      1         50         
A      2      1         50         
A      3      2         100        
A      4      3         50         
A      5      4         100        
A      6      4         100        
B      1      1         50         
B      2      2         100        
B      3      3         50         
B      4      3         50         
B      5      4         100        
B      6      4         100        

A simple group by with the Count aggregate answers the OP question.

select 
 (
  DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp})
     {PKcol1 Letter,PKcol2 RowNum,DenseRank}
 )   
  join TheInput
    group by {Letter,DenseRank,OtherValue} add{Count() Cnt}
      order by {Letter,DenseRank};
    
Letter DenseRank OtherValue Cnt 
------ --------- ---------- --- 
A      1         50         2   
A      2         100        1   
A      3         50         1   
A      4         100        2   
B      1         50         1   
B      2         100        1   
B      3         50         2   
B      4         100        2   

Summary

The parameter declaration of an operator, ie.

(aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer})

is called the 'signature' of the operator. Dataphor stores the
same name of an operator for each different signature. In other
words, the same operator name for different parameter types 
(ie. different table types). This is the concept of overloading.
One operator name covering different types (regardless of the
returned result or even if the code differs for different signatures).
This, obviously, significantly benefits developers. For some
types of complicated problems just one clever programmer is all it
takes to write the logic of the operator. To use the operator
other developers merely insure that the signature (parameter type(s))
match their own tables. If not, simply change the signature (and
possibly the data types of the result). It would not be difficult to
overload the dense rank operators to allow for compound keys of
more than two columns. Even the coding change for more than two
PK columns is simple. While overloaded operators are physically
stored multiple times (based on signiture), conceptually there
is but one dense rank operator. When the operator is invoked it
simply looks for the signature (type) that matches the table
being sent in as a parameter. Just think of the many types of
problems typically encountered. Many of these can follow the paradigm
of the dense rank. Imagine a library of these type of operators accessible
from any database(s).

(Feel free to contact me if you would like to see a specific example
 of a type of problem like the dense rank problem).

A final caveat

These operators could be coded in a more sophisticated fashion.
But before the fancy coding, before the modelling of a business problem,
before anything in the database, it all rests on understanding the
key concepts of a relational database like Dataphor. And concepts
like types and variables are just as important as clever code, if
not more so ☺ .

Thursday, September 20, 2007

Dataphor - Super Function II

No comments:

Beyond Sql

Blog Archive

Links

About Me