Beyond Sql: RAC - Rank This!

Sql Server 2005 Ranking functions vs. Rac Ranking

Principal players:
Rac v2.33 running on  (more info on Rac here)
MS Sql Server 2005 sp1

In general any abstraction for t-sql is good abstraction. And
the Sql-99 analytic ranking functions are indeed a welcome
addition to Sql Server 2005. There has been universal approval
and all kinds of favor has be accorded them from the Sql Server
community. But the history of sql is the history of the gang
that can't shoot straight. And the gang tends to always be a
day late and a dollar short. So did sql and MS show itself to
be all it can be with the new ranking functionality? Is the
Sql Server model superior or inferior to well..the 'RAC' model
of ranking. Two logically different approaches. Think of this
as Fox Sql, fair and balanced..you decide :)

Here is the general specification of a rank construct from Bol:

RANK_TYPE ( ) OVER ( [ < partition_by_clause > ] < order_by_clause > )

Excuse me if I ask just who thinks this stuff up? Perhaps they
should get off what their on or get back on what their off. It's
painful for those not steeped in sql speak and clearly doesn't 
capture the logical/conceptual meaning of the construct. How many
of you first looked at this spec and wondered just what is the basis
for the rank in the first place? So allow me to express it based on
common sense:

RANK_TYPE (For Column(s)/[ASC | DESC]) OVER TABLE <Optional:[PARTITION BY Column(s)]>

Res ipse loquitor - Now the thing speaks for itself. 

The rank is based on a column. To be fair this is what the sql
standard meant when they expressed the idea of the 'utility'
of a construct to express the ageless sql of a rank in a
subquery:

(select count(*) 
       from t1
         where t1.column<=t2.column) as Rank
         
The meaning of the ordering of the column and whether or not the
rank is obtained within groups/partitions or over the entire table
is clear.

Now here is the concept of a rank in Rac expressed in a similar way:

RANK_TYPE (For PARTITION Columns) OVER TABLE Optional ORDER BY Columns

Res ipse loquitor - Now the thing speaks for itself. 

Here the object of a rank is a partition! The sort order of the partition
are the column(s) of the partition but an optional sort order can be
defined with columns independent of the partition column(s). This is
a logically different concept than the sql rank. That an sql rank thru
a column(s) cannot encapsulate a rank predicated on partition(s) or
a structure if you will is because there is no guarantee that a column(s)
exists to capture or enumerate such a rank. In other words a single
subquery, the sql model, may not be adaquate. The underlying sql code
necessary to capture the rank may take the form of nested subqueries,
certainly more complicated than the sql rank single subquery.

The underlying Rac model for capturing ranks over partitions can be
seem here where I first proposed creating 'virtual groups':
    microsoft.public.sqlserver.programming 
    Mon, Jul 5 1999 
    'Difficult SQL Question'
    Name: 'Trysql' (but signed by yours truely -:)  
    The thread can be seen here. 
    
Of course a partition can be conceptually decomposed to simple column(s)
so the sql rank is encapsulated within the logical model of the Rac rank.

Here is an example that elucidates the two different ranking models.
(The data is extrapolated from an example of a Rac solution which can be seen here).

create table ranktest (ID tinyint primary key,ROWID tinyint,TYPE tinyint,POS tinyint,STRING char(1))
insert ranktest values(1,1,1,1,'1')
insert ranktest values(2,1,1,2,'2')
insert ranktest values(3,1,2,3,'-')
insert ranktest values(4,1,2,4,'%')
insert ranktest values(5,1,2,5,'=')
insert ranktest values(6,1,2,6,'B ')
insert ranktest values(7,1,1,7,'3')
insert ranktest values(8,1,1,8,'1')
insert ranktest values(9,1,1,9,'2')
insert ranktest values(10,1,2,10,'A')
insert ranktest values(11,2,2,1,'D')
insert ranktest values(12,2,2,2,'C')
insert ranktest values(13,2,2,3,'M')
insert ranktest values(14,2,1,4,'6')
insert ranktest values(15,2,1,5,'5')
insert ranktest values(16,2,2,6,'T')
insert ranktest values(17,2,2,7,'Y')
insert ranktest values(18,2,2,8,'W')
insert ranktest values(19,2,2,9,'R')
insert ranktest values(20,2,1,10,'6')
insert ranktest values(21,2,1,11,'7')

select * from ranktest

ID   ROWID TYPE POS  STRING 
---- ----- ---- ---- ------ 
1    1     1    1    1
2    1     1    2    2
3    1     2    3    -
4    1     2    4    %
5    1     2    5    =
6    1     2    6    B
7    1     1    7    3
8    1     1    8    1
9    1     1    9    2
10   1     2    10   A
11   2     2    1    D
12   2     2    2    C
13   2     2    3    M
14   2     1    4    6
15   2     1    5    5
16   2     2    6    T
17   2     2    7    Y
18   2     2    8    W
19   2     2    9    R
20   2     1    10   6
21   2     1    11   7

We want a dense rank for the partition ROWID,TYPE. Although ID 
and POS are redundant they imply a meaningfulness to the order
of rows both over the table and of TYPE within ROWID.

The Rac model can be represented conceptually as:

DENSE_RANK (For PARTITION {ROWID,TYPE}) OVER TABLE (ORDER BY ID)

or

DENSE_RANK (For PARTITION {ROWID,TYPE}) OVER TABLE (ORDER BY ROWID,POS)

Rac does not offer the autonomy of ranking specifications as in
sql. Partitions are defined by the ordinal position of columns
in @rows. In the absence of optional sorting of the rows using
@rowsort, the partition is ordered by the @rows specification
from left to right. With @rowsort the partition is orderd by the
@rowsort column(s). The Rac ranking parameter @rowindicators returns
a dense rank. 

The key parameters to obtain the dense rank are:

@rows='ROWID & TYPE & POS & STRING',
@rowsort='ID' or @rowsort='ROWID & POS'
@rowindicators='TYPE{DENSE_RANK}'

The @rows specification implies the needed partition. 
The 'TYPE' in the @rowindicators argument 'TYPE{DENSE_RANK}'
is the end point of the partition to obtain the dense rank for.
In other words, all partitions as defined in @rows will be included
for this dense rank up to and including TYPE. Therefore 'TYPE'
implies the inclusion 'ROWID' and the partition of 'ROWID,TYPE'
will be used for the rank. Any number of ranks can be defined
in a Rac execute but they will all be based on the implied partition
in @rows from left to right. A rank based on row_number() is
included via @rowcounters and uses the partition 'ROWID,TYPE'.
The name of the dense rank is within the tides, ie. '{DENSE_RANK}'.
The name of the row_number rank is 'ROW NUMBER'.

Rac executes:

Exec Rac
@transform='_dummy_',
@rows='ROWID & TYPE & POS & STRING',
-- Order the partitions (@rows).
@rowsort='ID',
@pvtcol='Report Mode',@defaultexceptions='dumy',
@from='ranktest',@defaults1='y',@rowbreak='n',@racheck='y',
@counterdatatype='tinyint',
-- DENSE_RANK() using PARTITION BY ROWID,TYPE ORDERED BY ID
@rowindicators='TYPE{DENSE_RANK}',
-- ROW_NUMBER() using PARTITION BY ROWID,TYPE ORDERED BY ID
@rowcounters='TYPE{ROW_NUMBER}'

OR

Exec Rac
@transform='_dummy_',
@rows='ROWID & TYPE & POS & STRING',
-- Order the partitions (@rows).
@rowsort='ROWID & POS'
@pvtcol='Report Mode',@defaultexceptions='dumy',
@from='ranktest',@defaults1='y',@rowbreak='n',@racheck='y',
@counterdatatype='tinyint',
@rowindicators='TYPE{DENSE_RANK}',
@rowcounters='TYPE{ROW_NUMBER}'

Result

ROWID  TYPE POS  STRING DENSE_RANK ROW_NUMBER 
------ ---- ---- ------ ---------- ---------- 
1      1    1    1      1          1
1      1    2    2      1          2
1      2    3    -      2          1
1      2    4    %      2          2
1      2    5    =      2          3
1      2    6    B      2          4
1      1    7    3      3          1
1      1    8    1      3          2
1      1    9    2      3          3
1      2    10   A      4          1
2      2    1    D      1          1
2      2    2    C      1          2
2      2    3    M      1          3
2      1    4    6      2          1
2      1    5    5      2          2
2      2    6    T      3          1
2      2    7    Y      3          2
2      2    8    W      3          3
2      2    9    R      3          4
2      1    10   6      4          1
2      1    11   7      4          2

With the sql rank the question becomes is there a column(s)
that encapsulates the bi-directional nature of TYPE within ROWID.
Sadly there is not. They only column appropriate for the sql rank
is the DENSE_RANK itself! In other words, the sql rank cannot
be stated:

DENSE_RANK() OVER (PARTITION BY ROWID,TYPE ORDER BY ?) as DENSE_RANK 

or is redundant and a non-solution:

DENSE_RANK() OVER (PARTITION BY ROWID ORDER BY TYPE) as DENSE_RANK 

Any of the sql ranking functions, not matter how they are forumulated,
will not reflect the meaning of the data. This is especially seen
by comparing the sql ROW_NUMBER() with the one from Rac :

select ID,ROWID,TYPE,POS,STRING,
ROW_NUMBER() OVER (PARTITION BY ROWID order by TYPE) as [TYPE ROW_NUMBER],
ROW_NUMBER() OVER (PARTITION BY ROWID,TYPE order by POS) as [POS ROW_NUMBER] 
from ranktest
order by ID

ID   ROWID TYPE POS  STRING TYPE ROW_NUMBER POS ROW_NUMBER RAC ROW_NUMBER 
---- ----- ---- ---- ------ --------------- -------------- -------------- 
1    1     1    1    1      1               1              1
2    1     1    2    2      2               2              2
3    1     2    3    -      7               1              1
4    1     2    4    %      8               2              2
5    1     2    5    =      9               3              3
6    1     2    6    B      10              4              4
7    1     1    7    3      3               3              1
8    1     1    8    1      4               4              2
9    1     1    9    2      5               5              3
10   1     2    10   A      6               5              1
11   2     2    1    D      9               1              1
12   2     2    2    C      10              2              2
13   2     2    3    M      11              3              3
14   2     1    4    6      1               1              1
15   2     1    5    5      2               2              2
16   2     2    6    T      5               4              1
17   2     2    7    Y      6               5              2
18   2     2    8    W      7               6              3
19   2     2    9    R      8               7              4
20   2     1    10   6      3               3              1
21   2     1    11   7      4               4              2

Again the ranks do not reflect the meaning of the data when the
rank is dependent on a column instead of a logical partition.

While the sql rank is indeed a welcome abstraction they could
have done a much better job. By using such a low level of
abstraction many clear and easy solutions to non-trivial problems
remain lost to most of the user community. This is unfortunate. 
While there is a certain quality in quantity sql never seems to
truely show that the sum is greater than the parts. And for those
who build database systems as well as those that use them and
who frown on abstraction, the burden is on them to tell the rest
of us why. Less can be very much more.



Obtaining sql ranks with Rac.

This is intended to show how Rac obtains equivalent sql ranks.

This example use the adventureworks query in Bol shown under
'Ranking Functions (Transact-SQL)' 
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm

An additional rank for ROW_NUMBER within PostalCode is added.

SELECT c.FirstName, c.LastName
    ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number'
--Rank for LastName,FirstName Partitioned by Postcode    
    ,ROW_NUMBER() OVER (PARTITION BY a.PostalCode ORDER BY c.LastName,c.FirstName) AS 'PART. Row Number'
    ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank'
    ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank'
    ,NTILE(4) OVER (ORDER BY a.PostalCode) AS 'Quartile'
    ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s 
    INNER JOIN Person.Contact c 
        ON s.SalesPersonID = c.ContactID
    INNER JOIN Person.Address a 
        ON a.AddressID = c.ContactID
WHERE TerritoryID IS NOT NULL 
    AND SalesYTD <> 0;
    
FirstName  LastName   Row Number PART. Row Number Rank Dense Rank SalesYTD    PostalCode      
---------- ---------- ---------- ---------------- ---- ---------- ----------- --------------- 
Maciej     Dusza      1          1                1    1          4557045.05  98027
Shelley    Dyck       2          2                1    1          5200475.23  98027
Linda      Ecoffey    3          3                1    1          3857163.63  98027
Carla      Eldridge   4          4                1    1          1764938.99  98027
Carol      Elliott    5          5                1    1          2811012.72  98027
Jauna      Elson      6          1                6    2          3018725.49  98055
Michael    Emanuel    7          2                6    2          3189356.25  98055
Terry      Eminhizer  8          3                6    2          3587378.43  98055
Gail       Erickson   9          4                6    2          5015682.38  98055
Mark       Erickson   10         5                6    2          3827950.24  98055
Martha     Espinoza   11         6                6    2          1931620.18  98055
Janeth     Esteves    12         7                6    2          2241204.04  98055
Twanna     Evans      13         8                6    2          1758385.93  98055    


Sql Rank                                   Rac Rank
ROW_NUMBER() OVER (ORDER BY a.PostalCode)  Rac by default computes a ROW_NUMBER over the
                                           table in the sort order implied by @rows or
                                           explicitly specified in @rowsort. It is 
                                           referenced by @tablecounter if there is no
                                           @select or by rd in a select statement.

RANK() OVER (ORDER BY a.PostalCode)        @rowruns='dumy^rank() within tablecounter^(dumy){PostalCode}'
                                           Rac does not have a default for RANK() as accounting
                                           for ties is seldomed used or needed. But it can be obtained 
                                           easily using a form of the Rac running sums parameter
                                           @rowruns. 
DENSE_RANK() OVER (ORDER BY a.PostalCode)  PostalCode{[Dense Rank]}
                                           A trivial case of a column (sql) and a partition (Rac)
                                           having the same semantics.
NTILE(4) OVER (ORDER BY a.PostalCode)      Rac has no direct equivalent expression. It can be computed
                                           using a combination of Rac options but it seems seldom
                                           called for.

The sql rank:

ROW_NUMBER() OVER (PARTITION BY a.PostalCode ORDER BY c.LastName,c.FirstName)


is equivalant to the Rac @rowcounters='PostalCode{[PART. Row Number]}'. The implied
sort order in @rows, PostalCode,LastName,FirstName, gives the Rac @rowcounters
specification the same semantics as this ROW_NUMBER.

Regardless of the order columns are specified in @rows, the data can
be presented/manipulated at will via the @select parameter.

Here is the Rac version which returns the same result as the adventureworks query.

Exec Rac
@transform='_dummy_',
@rows='a.PostalCode & c.LastName & c.FirstName & s.SalesYTD',
@pvtcol='Report Mode',
@from='Sales.SalesPerson s 
    INNER JOIN Person.Contact c 
        ON s.SalesPersonID = c.ContactID
    INNER JOIN Person.Address a 
        ON a.AddressID = c.ContactID',
@where='TerritoryID IS NOT NULL AND SalesYTD <> 0',
@rowbreak='n',@defaults1='y',@racheck='y',
@tablecounter='y',
@rowindicators='PostalCode{[Dense Rank]}',
@rowcounters='PostalCode{[PART. Row Number]}',
@rowruns='dumy^rank() within tablecounter^(dumy){PostalCode}',
@select='select FirstName,LastName,rd as [Row Number],[PART. Row Number],
            1*runs as RANK,[Dense Rank],SalesYTD,PostalCode 
               from rac
                 order by rd'
Tuesday, September 05, 2006

RAC - Rank This!

No comments:

Beyond Sql

Blog Archive

Links

About Me