Beyond Sql: September 2007

Thursday, September 20, 2007

Dataphor - Super Function II

This article extends the functionality of the Dense Rank function
described in:
http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html

For background on the concepts involved see:
http://beyondsql.blogspot.com/2007/06/dataphor-13-passing-table-as-parameter.html
http://beyondsql.blogspot.com/2007/09/dataphor-all-tables-are-typed-variables.html
http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html 
Sql Server 2005 Bol, Ranking Functions:
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm

The original DenseRank operator was based on a table with a single
primary key. We now extend the usability of the operator by overloading
it to return a dense rank for a compound primary key, ie. a primary key 
consisting of two columns.

As an example consider the problem posed in the thread:

microsoft.public.sqlserver.programming
Thursday, September 13, 2007 2:27 PM
SQL Query Help: Need to remove consecutive items and just keep the earliest

Here is some sample data (courtesy of Tom Cooper) created in an Sql Server
2005 database (Column names are a bit different and additional data has
been added):

CREATE TABLE AssignmentLog (
    TicketID [varchar] (6) NOT NULL,
    EventTime [smalldatetime] NOT NULL,
    GroupMember [varchar] (15) NOT NULL,
    DeptInNeed varchar(15) NOT NULL,
    aNumber int NOT NULL, 
  PRIMARY KEY CLUSTERED (TicketID, EventTime)
)
go
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:30:00', 'Helpdesk','Mkt',14)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:42:00', 'Server Admin','Mkt',43)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:50:00', 'Helpdesk','ACT',134)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 15:52:00', 'Helpdesk','Sales',60)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 16:01:00', 'Helpdesk','Mkt',88)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/10/2007 16:22:00', 'Server Admin','Fin',245)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/11/2007 8:30:00', 'Server Admin','Per',15)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC001', '9/11/2007 9:05:00', 'Field Services','Maint',75)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/10/2007 11:30:00', 'Helpdesk','HR',0)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/10/2007 15:30:00', 'Helpdesk','Fin',24)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 15:42:00', 'Field Services','Maint',42)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 15:50:00', 'Field Services','HR',63)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 15:52:00', 'Helpdesk','Sales',1)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 16:01:00', 'Helpdesk','HR',20)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/11/2007 16:22:00', 'Helpdesk','HR',25)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/12/2007 8:30:00', 'Field Services','MKT',75)
INSERT INTO AssignmentLog (TicketID, EventTime,
GroupMember,DeptInNeed,aNumber)
VALUES ('ABC002', '9/12/2007 9:05:00', 'Field Services','MKT',90)

We now work with the AssignmentLog table in Dataphor where it is
treated as a 'variable'.

select AssignmentLog;    

TicketID EventTime             GroupMember    DeptInNeed aNumber 
-------- --------------------- -------------- ---------- ------- 
ABC001   9/10/2007 3:30:00 PM  Helpdesk       Mkt        14      
ABC001   9/10/2007 3:42:00 PM  Server Admin   Mkt        43      
ABC001   9/10/2007 3:50:00 PM  Helpdesk       ACT        134     
ABC001   9/10/2007 3:52:00 PM  Helpdesk       Sales      60      
ABC001   9/10/2007 4:01:00 PM  Helpdesk       Mkt        88      
ABC001   9/10/2007 4:22:00 PM  Server Admin   Fin        245     
ABC001   9/11/2007 8:30:00 AM  Server Admin   Per        15      
ABC001   9/11/2007 9:05:00 AM  Field Services Maint      75      
ABC002   9/10/2007 11:30:00 AM Helpdesk       HR         0       
ABC002   9/10/2007 3:30:00 PM  Helpdesk       Fin        24      
ABC002   9/11/2007 3:42:00 PM  Field Services Maint      42      
ABC002   9/11/2007 3:50:00 PM  Field Services HR         63      
ABC002   9/11/2007 3:52:00 PM  Helpdesk       Sales      1       
ABC002   9/11/2007 4:01:00 PM  Helpdesk       HR         20      
ABC002   9/11/2007 4:22:00 PM  Helpdesk       HR         25      
ABC002   9/12/2007 8:30:00 AM  Field Services MKT        75      
ABC002   9/12/2007 9:05:00 AM  Field Services MKT        90      

Here's how the OP described his problem:
Quote
I just want to know when the ticket left the helpdesk to go to an 
outside group like Server Admins, Field Services, etc.  So when there
are consecutive entries within the same group, I just need the earliest
one of the series.  Here is the output I'm trying to generate:

ticket_id event_timestamp     group_assigned
--------- ------------------- ---------------
ABC001    09/10/2007 15:30:00 Helpdesk
ABC001    09/10/2007 15:42:00 Server Admin
ABC001    09/10/2007 15:50:00 Helpdesk
ABC001    09/10/2007 16:22:00 Server Admin
ABC001    09/11/2007 09:05:00 Field Services

But how can I do that in a query?  I can't simply use a GROUP BY
clause, something like this:

SELECT ticket_id, MIN(event_timestamp) AS event_timestamp,
group_assigned
FROM #AssignmentLog
GROUP BY ticket_id, group_assigned
ORDER BY ticket_id, MIN(event_timestamp)

Because that will lump EVERYTHING to the earliest occurance of each
group_assigned entry.  In other words that doesn't account for the
items in a consecutive series.  The output looks like:

ticket_id event_timestamp     group_assigned
--------- ------------------- ---------------
ABC001    09/10/2007 15:30:00 Helpdesk
ABC001    09/10/2007 15:42:00 Server Admin
ABC001    09/11/2007 09:05:00 Field Services

This shows the first time the helpdesk had the ticket, but doesn't
show when the ticket came BACK to the helpdesk.

I've tried multiple variations of sub-queries with unequal joins (<,
>, etc) but keep running into similar results... can't get the query
to notice consecutive entries.

I would really appreciate any direction or ideas someone may have
about how best to accomplish this.  For reference I'm using MS SQL
2000.
Close Quote

This is a classic problem of constructing a dense rank for group_assigned
(GroupMember) given the compound primary key (ticket_id/TicketID and
event_timestamp/EventTime). When the value of GroupMember stays the same
within a TicketID and EventTime (where we look at EventTime ascending
within TicketID) the rank stays the same. When GroupMember changes
the rank is incremented by 1, we get a new value for the rank. This
behavior is that of a dense rank. Once such a rank is obtained it
can be used as a grouping column and almost all problems (such as the
minimum EventTime for each GroupMember within a TicketID) reduce to 
a simple and straightforward query.
Unfortunately all to often developers are not able to simply state
what 'type' of problem they're trying to solve (the fact that the OP
is on S2000 and does not have ranking functions available is no
excuse. Should vendors be the ultimate authority on types of problems?).
The point is if there was a function library with something like
the DenseRank operator available a developer should know when it is 
appropriate.

The IDenseRank and DenseRank operators for a compound key

Here are the same operators for a single primary key overloaded for a
compound key where one column of the PK is a string (column PKcol2)
and the other (column PKcol2) is a datetime value. If you compare
the single PK operators to these you'll see there are no real drastic
changes. The table parameter (aTable) is changed to accommodate the
compound key and the result now includes both columns of the PK.
The central idea is that only the PK, regardless of whether it's
singular or compound, and the target of the rank (Grp) are the only
columns that are necessary. Note the conceptually similarity to
the S2008 dense_rank() function:

DENSE_RANK ( ) OVER ( [ < partition_by_clause > ] < order_by_clause > )

While the partition column is clearly TicketID we cannot use the
dense_rank() function where the target of the rank (GroupMember) repeats.
This being due to the colossal shortsightedness of sql of not separating
the target of the rank and the ordering of it. We should be able to order
the target of the rank by any column(s) we choose. All sql ranking functions
treat the target of the rank and the order of the target of the rank
as the same column(s). See:
http://beyondsql.blogspot.com/2006/09/rac-rank-this.html
for more details.

create operator IDenseRank
 (aTable:table{PKcol1:String,PKcol2:DateTime,Grp:String}):
     //This is the virtual table returned by IDenseRank.
        table{PK.col1:String,PK2Min:DateTime,PK2Max:DateTime,DenseRank:Integer}
begin
result:=table of typeof (result){};            
var D1:=
ToTable(ToList(cursor(
    (
 ToTable(ToList(cursor(
//We want to get PKs consecutively numbered (PK12Seq).
//
//The ToTable(ToList(cursor table order by { }))) construct is fundamentally
//important both functionally and conceptually. I will cover it in detail
//in future article(s).
//
 ToTable(ToList(cursor(aTable order by {PKcol1,PKcol2} )))
                            {PKcol1,PKcol2,Grp,sequence+1 PK12Seq}
     order by {Grp,PK12Seq}  )))  
      add{PK12Seq-(sequence+1) Drank}
       group by {PKcol1,Grp,Drank} add{Min(PKcol2) PK2Min,Max(PKcol2) PK2Max}
      )      
                 order by {PKcol1,PK2Min}  )))  
                   {PKcol1,PK2Min,PK2Max,sequence+1 DenseRank};
//                   
//We want to renumber the dense rank from 1 to N for each (ie,PKcol1/groupID)             
//
result:= 
    D1
     join
      (
        D1 group by {PKcol1} add {Min(DenseRank) OffSet} {PKcol1,OffSet-1 OffSet}
      )                          
        {PKcol1 PK.col1,PK2Min,PK2Max,DenseRank-OffSet DenseRank} ;
end;

create operator DenseRank(aTable:table{PKcol1:String,PKcol2:DateTime,Grp:String}):
            //This is the virtual table that the operator returns.
                         table{PKcol1:String,PKcol2:DateTime,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
   aTable {PKcol1,PKcol2}
    times               //times is like an sql cross join.
     IDenseRank(aTable) //The virtual result of the IDenseRank operator is used.
//Note the where statement is changed from the single PK operator. We include
//the condition PKcol1=PK.col1 so we get the right PKcol2 values for
//the correct PKcol1 values (groups).
      where (PKcol1=PK.col1) and (PKcol2 between PK2Min and PK2Max)
            {PKcol1,PKcol2,DenseRank};       
end;

We now follow the same methodology for a single PK. The AssignmentLog table
simply must be the same 'type' as the aTable parameter. That means the
column names and their data type must match those of the aTable parameter. 
All we have to do is rename the appropriate AssignmentLog columns, 
ie. TicketID to PKcol1, EventTime to PKcol2 and GroupMember to Grp.

select 
 DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp});     

PKcol1 PKcol2                DenseRank 
------ --------------------- --------- 
ABC001 9/10/2007 3:30:00 PM  1         
ABC001 9/10/2007 3:42:00 PM  2         
ABC001 9/10/2007 3:50:00 PM  3         
ABC001 9/10/2007 3:52:00 PM  3         
ABC001 9/10/2007 4:01:00 PM  3         
ABC001 9/10/2007 4:22:00 PM  4         
ABC001 9/11/2007 8:30:00 AM  4         
ABC001 9/11/2007 9:05:00 AM  5         
ABC002 9/10/2007 11:30:00 AM 1         
ABC002 9/10/2007 3:30:00 PM  1         
ABC002 9/11/2007 3:42:00 PM  2         
ABC002 9/11/2007 3:50:00 PM  2         
ABC002 9/11/2007 3:52:00 PM  3         
ABC002 9/11/2007 4:01:00 PM  3         
ABC002 9/11/2007 4:22:00 PM  3         
ABC002 9/12/2007 8:30:00 AM  4         
ABC002 9/12/2007 9:05:00 AM  4         

We get all the data from the AssignmentLog table and the DenseRank operator
with a simple join. We rename the columns of the result table from the 
DenseRank operator to the orginal column names (of the AssignmentLog table).
(We can rename columns of 'any' table since all tables are 'variables' ☺ .
As you can see the ranks are consecutively numbered starting from 1 within
each TicketID.

select 
(
 DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
)   
 join //This is a natural join (match on common columns, ie. TicketID and EventTime.  
    AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank}
        order by {TicketID,EventTime} ;


TicketID EventTime             GroupMember    DeptInNeed aNumber DenseRank 
-------- --------------------- -------------- ---------- ------- --------- 
ABC001   9/10/2007 3:30:00 PM  Helpdesk       Mkt        14      1         
ABC001   9/10/2007 3:42:00 PM  Server Admin   Mkt        43      2         
ABC001   9/10/2007 3:50:00 PM  Helpdesk       ACT        134     3         
ABC001   9/10/2007 3:52:00 PM  Helpdesk       Sales      60      3         
ABC001   9/10/2007 4:01:00 PM  Helpdesk       Mkt        88      3         
ABC001   9/10/2007 4:22:00 PM  Server Admin   Fin        245     4         
ABC001   9/11/2007 8:30:00 AM  Server Admin   Per        15      4         
ABC001   9/11/2007 9:05:00 AM  Field Services Maint      75      5         
ABC002   9/10/2007 11:30:00 AM Helpdesk       HR         0       1         
ABC002   9/10/2007 3:30:00 PM  Helpdesk       Fin        24      1         
ABC002   9/11/2007 3:42:00 PM  Field Services Maint      42      2         
ABC002   9/11/2007 3:50:00 PM  Field Services HR         63      2         
ABC002   9/11/2007 3:52:00 PM  Helpdesk       Sales      1       3         
ABC002   9/11/2007 4:01:00 PM  Helpdesk       HR         20      3         
ABC002   9/11/2007 4:22:00 PM  Helpdesk       HR         25      3         
ABC002   9/12/2007 8:30:00 AM  Field Services MKT        75      4         
ABC002   9/12/2007 9:05:00 AM  Field Services MKT        90      4         

With the inclusion of the DenseRank column we can write relatively simple
queries to answer questions concerning the grouping of TicketID and GroupMember.

We can easily answer the OP original question:
'So when there are consecutive entries within the same group, I just need
 the earliest one of the series.'

This is simply grouping by TicketID,GroupMember and DenseRank and getting
the minimum EventTime.

select
 ( 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank}
  )
    group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime}
                 order by {TicketID,DenseRank}; 

TicketID GroupMember    DenseRank MinTime               
-------- -------------- --------- --------------------- 
ABC001   Helpdesk       1         9/10/2007 3:30:00 PM  
ABC001   Server Admin   2         9/10/2007 3:42:00 PM  
ABC001   Helpdesk       3         9/10/2007 3:50:00 PM  
ABC001   Server Admin   4         9/10/2007 4:22:00 PM  
ABC001   Field Services 5         9/11/2007 9:05:00 AM  
ABC002   Helpdesk       1         9/10/2007 11:30:00 AM 
ABC002   Field Services 2         9/11/2007 3:42:00 PM  
ABC002   Helpdesk       3         9/11/2007 3:52:00 PM  
ABC002   Field Services 4         9/12/2007 8:30:00 AM  

Of course we can remove DenseRank from the result.

select
 ( 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank}
  )
    group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime}
             remove{DenseRank}
                 order by {TicketID,MinTime}; 
                 
TicketID GroupMember    MinTime               
-------- -------------- --------------------- 
ABC001   Helpdesk       9/10/2007 3:30:00 PM  
ABC001   Server Admin   9/10/2007 3:42:00 PM  
ABC001   Helpdesk       9/10/2007 3:50:00 PM  
ABC001   Server Admin   9/10/2007 4:22:00 PM  
ABC001   Field Services 9/11/2007 9:05:00 AM  
ABC002   Helpdesk       9/10/2007 11:30:00 AM 
ABC002   Field Services 9/11/2007 3:42:00 PM  
ABC002   Helpdesk       9/11/2007 3:52:00 PM  
ABC002   Field Services 9/12/2007 8:30:00 AM  

And we can do pretty much anything we want quite easily. Here we
show a concatenated list of Depts (DeptInNeed) for each grouping.

select
 ( 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank,',' Del}
   adorn {key{TicketID,EventTime}}
 )  
    group by {TicketID,GroupMember,DenseRank} add{Min(EventTime) MinTime,
             Concat(DeptInNeed,Del order by {TicketID,EventTime}) AllDepts}
                 order by {TicketID,DenseRank};  
                 
TicketID GroupMember    DenseRank MinTime               AllDepts      
-------- -------------- --------- --------------------- ------------- 
ABC001   Helpdesk       1         9/10/2007 3:30:00 PM  Mkt           
ABC001   Server Admin   2         9/10/2007 3:42:00 PM  Mkt           
ABC001   Helpdesk       3         9/10/2007 3:50:00 PM  ACT,Sales,Mkt 
ABC001   Server Admin   4         9/10/2007 4:22:00 PM  Fin,Per       
ABC001   Field Services 5         9/11/2007 9:05:00 AM  Maint         
ABC002   Helpdesk       1         9/10/2007 11:30:00 AM HR,Fin        
ABC002   Field Services 2         9/11/2007 3:42:00 PM  Maint,HR      
ABC002   Helpdesk       3         9/11/2007 3:52:00 PM  Sales,HR,HR   
ABC002   Field Services 4         9/12/2007 8:30:00 AM  MKT,MKT      

Or show distinct Depts for each grouping.
(If you don't understood some of this stuff ask ☺ Everything in due time ☺ . 

var D1:= //D1 is somewhat analogous to an sql CTE. But it's a 'variable'! 
  (
    DenseRank(AssignmentLog {TicketID PKcol1,EventTime PKcol2,GroupMember Grp}) 
     {PKcol1 TicketID,PKcol2 EventTime,DenseRank}
  )   
  join AssignmentLog 
      {TicketID,EventTime,GroupMember,DeptInNeed,aNumber,DenseRank,',' Del};
 
select 
  (
    D1 group by {TicketID,GroupMember,DenseRank}
    add{Min(EventTime) MinTime}
  )  
   join
    (
      (D1 {TicketID,GroupMember,DenseRank,DeptInNeed,Del} adorn{key{DeptInNeed}})
        group by {TicketID,GroupMember,DenseRank}
          add{ Concat( distinct DeptInNeed,Del order by {DeptInNeed}) DistinctDepts}
     ) 
                 order by {TicketID,DenseRank};   
                 
TicketID GroupMember    DenseRank MinTime               DistinctDepts 
-------- -------------- --------- --------------------- ------------- 
ABC001   Helpdesk       1         9/10/2007 3:30:00 PM  Mkt           
ABC001   Server Admin   2         9/10/2007 3:42:00 PM  Mkt           
ABC001   Helpdesk       3         9/10/2007 3:50:00 PM  ACT,Mkt,Sales 
ABC001   Server Admin   4         9/10/2007 4:22:00 PM  Fin,Per       
ABC001   Field Services 5         9/11/2007 9:05:00 AM  Maint         
ABC002   Helpdesk       1         9/10/2007 11:30:00 AM Fin,HR        
ABC002   Field Services 2         9/11/2007 3:42:00 PM  HR,Maint      
ABC002   Helpdesk       3         9/11/2007 3:52:00 PM  HR,Sales      
ABC002   Field Services 4         9/12/2007 8:30:00 AM  MKT           


Another example of a dense rank with a compound primary key

This example is based on the post:

comp.databases.ms-sqlserver
Sep 5, 7:59 am
Grouping similar rows
http://tinyurl.com/3378lk

Here is the create table statement and some rows entered into
an Sql Server 2005 database (courtesy of MVP Hugo Kornelis).

CREATE TABLE TheInput
     (Letter char(1) NOT NULL,
      RowNum int NOT NULL,
      OtherValue int NOT NULL,
      PRIMARY KEY (Letter, RowNum));
go
INSERT INTO TheInput (Letter, RowNum, OtherValue)
SELECT 'A', 1, 50
UNION ALL
SELECT 'A', 2, 50
UNION ALL
SELECT 'A', 3, 100
UNION ALL
SELECT 'A', 4, 50
UNION ALL
SELECT 'A', 5, 100
UNION ALL
SELECT 'A', 6, 100;
INSERT INTO TheInput (Letter, RowNum, OtherValue)
SELECT 'B', 1, 50
UNION ALL
SELECT 'B', 2, 100
UNION ALL
SELECT 'B', 3, 50
UNION ALL
SELECT 'B', 4, 50
UNION ALL
SELECT 'B', 5, 100
UNION ALL
SELECT 'B', 6, 100; 

We now work with the table in Dataphor.

select TheInput; 

Letter RowNum OtherValue 
------ ------ ---------- 
A      1      50         
A      2      50         
A      3      100        
A      4      50         
A      5      100        
A      6      100        
B      1      50         
B      2      100        
B      3      50         
B      4      50         
B      5      100        
B      6      100     

The OP poses his problem as:

Quote
I would like to group rows based on order and similarity in a single
sql-query if possible:

A, 1, 50
A, 2, 50
A, 3, 100
A, 4, 50
A, 5, 100
A, 6, 100

Would come out as:

A, 1, 50, 2 <-- Last value shows number of grouped rows
A, 2, 100, 1
A, 3, 50, 1
A, 4, 100, 2 
UnQuote

Like the previous example, there is no mention in either the question
or the solution of a dense rank, yet that is what the problem calls for,
ie. a dense rank over OtherValue within each Letter and in the ascending
order of RowNum.

We need only check the compound key version of the DenseRank and
IDenseRank operators to insure they accept a table as a parameter
of the same 'type' as TheInput table.

The current compound DenseRank operator accepts a table of type:

table{PKcol1:String,PKcol2:DateTime,Grp:String}

While the Letter column of the TheInput table is the same data type as
PKcol1 (String), the PKcol2 data type (DateTime) is different than the
RowNum column type (Integer). In addition the OtherValue column (Integer),
the target of the rank, is different than the Grp type (String). So all
we have to do is again overload the compound DenseRank operators to accept
the compound key and Grp data types of the TheInput table. It's that simple ☺ .

Here are the operators overloaded to accept a table of type:

table{PKcol1:String,PKcol2:Integer,Grp:Integer}

create operator IDenseRank
 (aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}):
       table{PK.col1:String,PK2Min:Integer,PK2Max:Integer,DenseRank:Integer}
begin
result:=table of typeof (result){};            
var D1:=
ToTable(ToList(cursor(
    (
 ToTable(ToList(cursor(
 ToTable(ToList(cursor(aTable order by {PKcol1,PKcol2} )))
                            {PKcol1,PKcol2,Grp,sequence+1 PK12Seq}
     order by {Grp,PK12Seq}  )))  
      add{PK12Seq-(sequence+1) Drank}
       group by {PKcol1,Grp,Drank} add{Min(PKcol2) PK2Min,Max(PKcol2) PK2Max}
      )      
                 order by {PKcol1,PK2Min}  )))  
                   {PKcol1,PK2Min,PK2Max,sequence+1 DenseRank};
result:= 
    D1
     join
      (
        D1 group by {PKcol1} add {Min(DenseRank) OffSet} {PKcol1,OffSet-1 OffSet}
      )                          
        {PKcol1 PK.col1,PK2Min,PK2Max,DenseRank-OffSet DenseRank} ;
end;

create operator DenseRank(aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer}):
                     table{PKcol1:String,PKcol2:Integer,DenseRank:Integer}
begin
result:=
        table of typeof (result){};                        
result:=
   aTable {PKcol1,PKcol2}
    times
     IDenseRank(aTable)
      where (PKcol1=PK.col1) and (PKcol2 between PK2Min and PK2Max)
            {PKcol1,PKcol2,DenseRank};       
end;

Now we can use the compound dense rank for a table of type TheInput.

select 
 DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp}) ;
 
PKcol1 PKcol2 DenseRank 
------ ------ --------- 
A      1      1         
A      2      1         
A      3      2         
A      4      3         
A      5      4         
A      6      4         
B      1      1         
B      2      2         
B      3      3         
B      4      3         
B      5      4         
B      6      4     

select 
 (
  DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp})
     {PKcol1 Letter,PKcol2 RowNum,DenseRank}
 )   
  join TheInput
        order by {Letter,RowNum} ; 
        
Letter RowNum DenseRank OtherValue 
------ ------ --------- ---------- 
A      1      1         50         
A      2      1         50         
A      3      2         100        
A      4      3         50         
A      5      4         100        
A      6      4         100        
B      1      1         50         
B      2      2         100        
B      3      3         50         
B      4      3         50         
B      5      4         100        
B      6      4         100        

A simple group by with the Count aggregate answers the OP question.

select 
 (
  DenseRank(TheInput {Letter PKcol1,RowNum PKcol2,OtherValue Grp})
     {PKcol1 Letter,PKcol2 RowNum,DenseRank}
 )   
  join TheInput
    group by {Letter,DenseRank,OtherValue} add{Count() Cnt}
      order by {Letter,DenseRank};
    
Letter DenseRank OtherValue Cnt 
------ --------- ---------- --- 
A      1         50         2   
A      2         100        1   
A      3         50         1   
A      4         100        2   
B      1         50         1   
B      2         100        1   
B      3         50         2   
B      4         100        2   

Summary

The parameter declaration of an operator, ie.

(aTable:table{PKcol1:String,PKcol2:Integer,Grp:Integer})

is called the 'signature' of the operator. Dataphor stores the
same name of an operator for each different signature. In other
words, the same operator name for different parameter types 
(ie. different table types). This is the concept of overloading.
One operator name covering different types (regardless of the
returned result or even if the code differs for different signatures).
This, obviously, significantly benefits developers. For some
types of complicated problems just one clever programmer is all it
takes to write the logic of the operator. To use the operator
other developers merely insure that the signature (parameter type(s))
match their own tables. If not, simply change the signature (and
possibly the data types of the result). It would not be difficult to
overload the dense rank operators to allow for compound keys of
more than two columns. Even the coding change for more than two
PK columns is simple. While overloaded operators are physically
stored multiple times (based on signiture), conceptually there
is but one dense rank operator. When the operator is invoked it
simply looks for the signature (type) that matches the table
being sent in as a parameter. Just think of the many types of
problems typically encountered. Many of these can follow the paradigm
of the dense rank. Imagine a library of these type of operators accessible
from any database(s).

(Feel free to contact me if you would like to see a specific example
 of a type of problem like the dense rank problem).

A final caveat

These operators could be coded in a more sophisticated fashion.
But before the fancy coding, before the modelling of a business problem,
before anything in the database, it all rests on understanding the
key concepts of a relational database like Dataphor. And concepts
like types and variables are just as important as clever code, if
not more so ☺ .

Sunday, September 09, 2007

Dataphor - All tables are typed variables

Everything you work with in D4, every object you define, is a variable.
And since everything is a variable it must typed. Understanding variables
and their types is fundamental to working in the D4 language. There are
5 types: Scalar (like integers and strings), Row, Table, List, and Cursor.
Working with types is not hard. It's all logic and quite simple at that.

Table types

It seems many people have a hard time understanding a table as a variable.
But once you wrap your head around it you'll realize it's just common sense
applied to a database. All we're doing is what you do with a number.

This declares the variable named MyNum to be of type integer.

var MyNum:Integer;

This assigns the number 1 to the variable. So now we can work with MyNum
and its current value of 1.

MyNum:=1;

Why shouldn't we be able to do the same thing with a table?
For example:

var MyLTable:table {ID:Integer,Y:String } ;

We have just defined the variable MyLTable to be a type of table
and specifically a table with column ID of type Integer and column
Y of type String. The '{ID:Integer,Y:String }' is the the heading
and is all that is necessary to define this particular table type.

Just as we assigned the value 1 to the integer variable MyNum we
can assign values to the table variable MyLTable.

The values we assign to a table are 'rows'. So we use the row type
for the assignment.

Suppose we try this:

var MyLTable:table {ID:Integer,Y:String } ;
MyLTable := row { 1 ID,'A' Y };

But this won't work. We are attempting to assign a type row to a
table type. This is analogous to trying to assign a string to a variable
of integer. The two types are incompatible and the assignment wouldn't
work.

var MyNum:Integer;
MyNum:='A';         //This is a type mismatch and will throw an error.

It only makes sense that we can assign a table to a variable of a table
type (and of the same type).
(Of course we can 'insert' into a table which we'll see see later and
'update' a table). This batch of 3 statements shows a type of table
with rows defined with the same heading (column names and data types)
as variable MyLTable on the right of the := that we can assign to MyLTable.
Even the table on the right side of the := is a table variable.

var MyLTable:table {ID:Integer,Y:String } ;
MyLTable := table
                 {
                  row { 1 ID,'A' Y },
                  row { 2 ID,'B' Y }
                  };
select MyLTable;

ID Y
-- -
1  A
2  B

Note that if we want to assign a variable a row value and not a table we
simply define the variable as a row type (with a heading like the heading
required for a table).

var MyLRow:row {ID:Integer,Y:String } ;
MyLRow := row { 1 ID,'A' Y };
select MyLRow;

ID Y
-- -
1  A

The variables defined so far are memory variables. They exist only for
the duration of the batch they are in. Persisted tables can, of course,
be defined. Dataphor uses the concept of a device. A device is an sql
database that Dataphor uses to store table data. For example Sql Server,
Oracle or DB2.
The 'create table' statement is used to persist data in an sql database.
For example, the table SqlServerTable1 uses an Sql Server 2005 database
as data storage.

create table SqlServerTable1
 {
  ID:Integer,
  Y:String,
  key{ID}
 };
 
The SqlServerTable1 table, like the MyLTable table, is a variable. Every
table defined, regardless of where the data is stored, is a table 'variable'.
Also note that only the columns and their data type determine the type of
the table. Other information like keys are not part of the type definition.

Lets assign (persisted) rows to SqlServerTable1.

SqlServerTable1:= table{row{1 ID,'A' Y},row{2 ID,'B' Y}};
select SqlServerTable1; 

ID Y
-- -
1  A
2  B

Lets redefine the definition of the SqlServerTable1 table (variable).

SqlServerTable1:= table{row{3 ID,'A' Y},row{4 ID,'B' Y}};
select SqlServerTable1;

ID Y
-- -
3  A
4  B

Changing the assignment of the table is no different than changing
the assignment of a number to a variable.

var MyNum:Integer;
MyNum:=1;
select MyNum; //1
MyNum:=2;
select MyNum;
2

Always remember that when I refer to a table in D4 it is a table variable.

There are wonderful benefits to this concept. For example, just as
two integer type variables can be compared so too can two table variables.
But you must think in terms of types. When is such a comparison appropriate,
when does it make sense? It only makes sense when the two tables are of the
same type. Just as it only makes sense to compare two integer variables.
Two tables are equal if they have the same columns and corresponding data
types and have the same set of rows (values).

For example, the below select statement displays the rows of MyLTable
since the comparison of tables in the where statement is true (for each row
of MyTable). The two tables are of the same type ({ID:Integer,Y:String})
and have the same rows (values).

var MyLTable:table {ID:Integer,Y:String} ;
MyLTable := table
                 {
                  row {1 ID,'A' Y},
                  row {2 ID,'B' Y}
                  };
var MyLTable1:table {ID:Integer,Y:String} ;
MyLTable1 := table
                 {
                  row {1 ID,'A' Y},
                  row {2 ID,'B' Y}
                  };                 
select MyLTable
 where MyLTable=MyLTable1; //MyLTable=MyLTable1 is true.

ID Y
-- -
1  A
2  B 

Here the two tables are of the same type but do not have the same rows. The
where statement evaluates to false for each row of MyLTable and no rows
are selected.

var MyLTable:table {ID:Integer,Y:String } ;
MyLTable := table
                 {
                  row {1 ID,'A' Y},
                  row {2 ID,'B' Y}
                  };
var MyLTable1:table {ID:Integer,Y:String } ;
MyLTable1 := table { row {1 ID,'A' Y} };                 
select MyLTable
 where MyLTable=MyLTable1; MyLTable=MyLTable1 is false.

ID Y
-- -


Now look at this comparison.

var MyLTable:table {ID:Integer,Y:String } ;
MyLTable := table
                 {
                  row {1 ID,'A' Y},
                  row {2 ID,'B' Y}
                  };
var MyLTable1:table {ID:Integer,Z:String } ;
MyLTable1 := table
                 {
                  row {1 ID,'A' Z},
                  row {2 ID,'B' Z}
                  };                 
select MyLTable
 where MyLTable=MyLTable1; //An invalid comparison, two different table types.
                           //The compiler will raise an error because of the
                           //type 'mismatch'.

Here the two tables are of different type:
{ID:Integer,Y:String} vs.{ID:Integer,Z:String}
One table has a column Y the other a column Z so the comparison
itself is invalid. In other words, it's simply not 'logical' to
compare two different types!

But again we're working with table variables, not a table that is a
file (sql) or a table that is a value (literal). A variable can be
changed which means its type can be changed. One way to change the
type of a table is simply to change a column name. Here we make
the table comparison valid by changing the type of table MyLTable1
to the type of table MyLTable by renaming column Z to Y.

var MyLTable:table {ID:Integer,Y:String} ;
MyLTable := table
                 {
                  row {1 ID,'A' Y},
                  row {2 ID,'B' Y}
                  };
var MyLTable1:table {ID:Integer,Z:String} ;
MyLTable1 := table
                 {
                  row {1 ID,'A' Z},
                  row {2 ID,'B' Z}
                  } ;                 
select MyLTable
 where MyLTable=(MyLTable1 rename {Z Y}); //Now we can compare the tables
                                          //because they are the same type
                                          //(Same column names and data types).

ID Y
-- -
1  A
2  B


Inserting into a table

Previously we created table SqlServerTable1 as a persisted table where
the data is stored in Sql Server. We left it with 2 rows.

select SqlServerTable1;

ID Y
-- -
3  A
4  B

We can of course insert into the table. Here we insert a row.

insert row {10 ID, 'C' Y} into SqlServerTable1;

Inserting a row makes perfect sense. What about inserting multiple rows?

insert
       row {11 ID,'D' Y},
       row {12 ID,'E' Y}
                  into SqlServerTable1;

This does not work. We can insert A row into a table. If we want to insert
multiple rows we can insert each row separately.

insert row {11 ID,'D' Y} into SqlServerTable1;       
insert row {12 ID,'E' Y} into SqlServerTable1;

If we want to insert consecutive rows with a single insert we have to
use the appropriate type to insert into the table. And that type is
a table. So we can insert one table into another.

insert
  table
      {
       row {11 ID,'D' Y},
       row {12 ID,'E' Y}
      }
                  into SqlServerTable1;

Not only do you have to think in terms of types but of the relationship
between types. It is types, their relationships and variables that are
fundamental to a 'relational' database and distinguish D4 from an sql
database, a non-relational database.

The data types of a column of a table

Each column in a tables heading has to have a corresponding data type. This
is usually a Dataphor provided system scalar type. Dataphor provides the
following scalar data types: Decimal, Long, Integer, Short, Byte, Boolean,
String, TimeSpan, DateTime, Date, Time, Money and Binary. But a column type
need not be restricted to these 'scalar' types. For example a 'row' type and
'list' type are as logical as a scalar type. Any 'type' supported by Dataphor
should logically be avaliable for the type of a column. It's that simple :)

Because Sql Server has no support for a 'list' type or 'row' type we
can't create a persisted table in the sql database with these types. But
we can create a persisted table in Dataphor known as a 'session' table.
A session table is like a temporary table (#) in Sql Server. It will
disappear when the session ends. (There are ways to persist such types
in an sql database but that is beyond the scope of this article ☻ ).

Consider session table MySTable with a list and row type and an appropriate
insert statement for a inserting a row.

create session table MySTable
{
  A:Integer,
  B:String,
  LList:list(String),
  LRow:row{X:Integer,Y:String},
  key{A}
}; 
insert row{1 A, 'A1' B, {'J','K','L'} LList, row{10 X,'R1' Y} LRow} into MySTable;

We can extend the idea of the LRow column by using a row type for the columns
of LRow. LRow can be a row consisting of columns which are themselves rows
(we can nest the row type).

create session table MySTable1
{
  A:Integer,
  B:String,
  LRow:row{ R1:row{X1:Integer,Y1:String} , R2:row{X2:Integer,Y2:String} },
  key{A}
}; 
insert row { 1 A,'A1' B,row{ row{10 X1,'S1' Y1} R1,row{20 X2,'S2' Y2} R2 } LRow }
                              into MySTable1;


(I leave the significant benefits of working with tables that contain list
 and row types and just how to work with them for other articles and future
 articles).

The fact that all tables are typed and are variables allows a table to
be passed as a parameter to a procedure just as you would pass a variable
of type integer. For further info on this see:
http://beyondsql.blogspot.com/2007/06/dataphor-13-passing-table-as-parameter.html
http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html

Thursday, September 20, 2007

Dataphor - Super Function II

Sunday, September 09, 2007

Dataphor - All tables are typed variables

Beyond Sql

Blog Archive

Links

About Me