Beyond Sql: The Sql ranking OVERture

Just about everyone who uses Sql Server 2005 likes the ranking functions. But like
a lot of things in sql they are a mile long and an inch deep. You sort of know the
syntax but when it comes to really understanding the concepts behind these functions 
things go south. One of the really interesting benefits of a relational system like
Dataphor is that the underlying relational concepts can illuminate what's going in sql. 
This article looks at the sql ranking functions through the prism of relational types. 
Sql ranking functions can be explained and understood in terms of the basic relational
types of cursor, table and list.  
 
I'm going to use a small sample of data from the Orders table from the Sq1 Server 2005
sample Northwind database: 

CREATE TABLE TRANK (OrderID int,CustomerID char(5),ShipCountry varchar(10),Freight money,
                     primary key (OrderID))
GO
INSERT INTO TRANK (OrderID,CustomerID,ShipCountry,Freight)
SELECT OrderID,CustomerID,ShipCountry,Freight
FROM Orders
WHERE CustomerID in ('CENTC','GROSR','LAZYK','NORTS','SPECD')
 
SELECT * FROM TRANK

OrderID     CustomerID ShipCountry Freight
----------- ---------- ----------- -------
10259       CENTC      Mexico      3.25
10268       GROSR      Venezuela   66.29
10482       LAZYK      USA         7.48
10517       NORTS      UK          32.07
10545       LAZYK      USA         11.92
10738       SPECD      France      2.91
10752       NORTS      UK          1.39
10785       GROSR      Venezuela   1.51
10907       SPECD      France      9.19
10964       SPECD      France      87.38
11043       SPECD      France      8.80
11057       NORTS      UK          4.13

Now before electricity, I mean before Sql Server 2005 and ranking functions, you had
to use a not very pretty subquery or join to get any kind of ranks/rownumbers. The
outstanding issues with these queries are they could get complicated and often result
in poor performance since they involve non-equal (>,<) comparisons involving NxN number
of rows. For example, consider getting a row number and a rank for CustomerID. 
(
 For more background see 
 Visualizing a ranking query 
 http://beyondsql.blogspot.com/2007/06/dataphor-sql-visualizing-ranking-query.html
)

Using an sql subquery to get a row number and a rank:

SELECT A.OrderID,A.CustomerID,A.ShipCountry,A.Freight,
-- Klunky subquery with a count and non-equi predicates to get unique row numbers for
-- CustomerID.
-- Duplicate CustomerID values are resolved by using the unique values of OrderID.
(SELECT COUNT(*) FROM TRANK AS B
WHERE (B.CustomerID<A.CustomerID) 
          OR ((B.CustomerID=A.CustomerID) and (B.OrderID<A.OrderID)))+1 as RowNumber,
-- Another klunky subquery to get ranks for CustomerID.
-- Important: note that Rank is the minimum RowNumber for each CustomerID.          
(SELECT COUNT(*) FROM TRANK AS B WHERE B.CustomerID<A.CustomerID)+1 as Rank
FROM TRANK AS A
ORDER BY CustomerID,OrderID

OrderID     CustomerID ShipCountry Freight RowNumber Rank
----------- ---------- ----------- ------- --------- ----
10259       CENTC      Mexico      3.25    1         1   <|-- Min RowNumber
10268       GROSR      Venezuela   66.29   2         2    <|-- Min RowNumber
10785       GROSR      Venezuela   1.51    3         2     |   
10482       LAZYK      USA         7.48    4         4     <|-- Min RowNumber
10545       LAZYK      USA         11.92   5         4      |
10517       NORTS      UK          32.07   6         6      <|-- Min RowNumber
10752       NORTS      UK          1.39    7         6       |
11057       NORTS      UK          4.13    8         6       |
10738       SPECD      France      2.91    9         9       <|-- Min RowNumber
10907       SPECD      France      9.19    10        9        |
10964       SPECD      France      87.38   11        9        |
11043       SPECD      France      8.80    12        9        | 

As the query shows there is a straightforward relationship between RowNumber and Rank. 
So even if you just wanted Rank, if you get the row numbers first, which will always
be unique, then all you have to do is group by CustomerID and take the minimum row
number value to get the Rank. If CustomerID and Rank were in a derived table it could
easily be joined back to TRANK to get all the data. Something like this.

Sql to derive ranks from row numbers:

SELECT G.OrderID,G.CustomerID,G.ShipCountry,G.Freight,Rank
FROM TRANK AS G 
      INNER JOIN
-- Get CustomerID and their rank from row numbers and join back to main table.      
       (SELECT D.CustomerID,MIN(RowNumber) AS Rank
        FROM (SELECT A.CustomerID,
        (SELECT COUNT(*) FROM TRANK AS B
          WHERE (B.CustomerID<A.CustomerID) 
                 OR ((B.CustomerID=A.CustomerID) and (B.OrderID<A.OrderID)))+1
                                                                        AS RowNumber
        FROM TRANK AS A) AS D
        GROUP BY D.CustomerID) AS F
            ON G.CustomerID=F.CustomerID
ORDER BY G.CustomerID,G.OrderID

OrderID     CustomerID ShipCountry Freight Rank
----------- ---------- ----------- ------- ----
10259       CENTC      Mexico      3.25    1
10268       GROSR      Venezuela   66.29   2
10785       GROSR      Venezuela   1.51    2
10482       LAZYK      USA         7.48    4
10545       LAZYK      USA         11.92   4
10517       NORTS      UK          32.07   6
10752       NORTS      UK          1.39    6
11057       NORTS      UK          4.13    6
10738       SPECD      France      2.91    9
10907       SPECD      France      9.19    9
10964       SPECD      France      87.38   9
11043       SPECD      France      8.80    9

But you're probably wondering why do something like this when you can use a subquery
directly for Rank? Yeah, practically speaking it's silly but the idea of how row number
and Rank are related like this is part of the puzzle of what lies behind the ranking
functions in sql. Just because you haven't seen this in a textbook (or anyplace else)
doesn't mean it's not relevant. ☺ Lets continue.

Ok here is the data using the sql ranking functions. It's important to realize that
ROW_NUMBER will always create unique values no matter how you sort the data. On the
other hand RANK is sensitive to how you order the data. You'll get duplicate RANKs
if there are duplicate values of the ORDER BY column(s). This is because RANK counts
prior duplicate values (ie CustomerID) as if they were unique. And this can lead to
RANK values skipping around. ☺ But again, bear in mind that Rank can be derived from
row numbers as shown above.

Example sql ranking functions query:

SELECT OrderID,CustomerID,ShipCountry,Freight,
-- ROW_NUMBER guarantees unique ranks regardless of whether or not the ORDER BY colum(s) 
-- are a primary key of the table. CustomerID,OrderID is a primary key, CustomerID 
-- is not. We assume that when ordering by just CustomerID that within duplicate
-- CustomerIDs the ordering is non deterministic but always results in unique row 
-- number values.
ROW_NUMBER() OVER (ORDER BY CustomerID) AS RowNumber1,
ROW_NUMBER() OVER (ORDER BY CustomerID,OrderID) AS RowNumber2,
-- RANK guarantees unique values only if the ORDER BY column(s) are a primary key, 
-- ie. if the ORDER BY column(s) value(s) target only unique rows then the ranks are
-- unique. CustomerID,OrderID is a primary key of the table so the RANKs are unique.
RANK() OVER (ORDER BY CustomerID,OrderID) AS UniqueRank,
-- RANKing by a non primary key (CustomerID) generates duplicate values.
RANK() OVER (ORDER BY CustomerID) AS Rank,
-- DENSE RANK returns consecutive but repeating ranks for a non primary key (CustomerID).
DENSE_RANK() OVER (ORDER BY CustomerID) AS DenseRank
FROM TRANK
ORDER BY CustomerID,OrderID

OrderID     CustomerID ShipCountry Freight RowNumber1 RowNumber2 UniqueRank Rank DenseRank
----------- ---------- ----------- ------- ---------- ---------- ---------- ---- ---------
10259       CENTC      Mexico      3.25    1          1          1          1    1
10268       GROSR      Venezuela   66.29   2          2          2          2    2
10785       GROSR      Venezuela   1.51    3          3          3          2    2
10482       LAZYK      USA         7.48    4          4          4          4    3
10545       LAZYK      USA         11.92   5          5          5          4    3
10517       NORTS      UK          32.07   8          6          6          6    4
10752       NORTS      UK          1.39    6          7          7          6    4
11057       NORTS      UK          4.13    7          8          8          6    4
10738       SPECD      France      2.91    12         9          9          9    5
10907       SPECD      France      9.19    9          10         10         9    5
10964       SPECD      France      87.38   10         11         11         9    5
11043       SPECD      France      8.80    11         12         12         9    5

The ranking functions overcome the two big obstacles of coding them as a subquery,
they're easier and they perform much better because they don't involve the
potentially terribly high number of comparisons involving the (>,<) predicates.

Now I'm pretty sure you gathered how to use ranking functions from BOL. I'm also
pretty sure you gained minimal insight about them from BOL. The documentation
is a really botched job (perhaps I'll go into detail in another article. ☺ Suffice
it to say, for example, that OVER is not a clause but a keyword indicating that what
follows it is an OLAP (ie. ranking) function. What follows OVER is a clause, a 
WINDOW clause consisting of at least an order (ORDER BY) specification. You can
take a step up from BOL by going to the sql standard from which ranking functions
came to gleam a little more about them:

ISO/ANSI: Introduction to OLAP functions (5/99)
http://tinyurl.com/2taahc

Bear in mind that given the ranking function:

RANK() OVER (ORDER BY CustomerID) AS Rank 

The WINDOW is '(ORDER BY CustomerID)'. Now consider what the standard says about 
the ordering (ORDER BY):

"Ordering in windows is specified with the same <sort specification list> used by
 cursors, and with the same semantics."  
"Whether in a cursor or a window, a <sort specification list> specifies an ordering
 of rows. The difference is that, in a cursor, the ordering determines the sequence
 to present rows during sequential fetches. In a window, the ordering helps to
 determine the value of <OLAP function>s."
"The user can copy the <order specification list> from a cursor ORDER BY to a window
 ORDER BY, or the reverse, with precisely the same semantics in each context."
 
If it quacks like a duck, swims like a duck and walks like a duck there's pretty good
reason to believe it's at least related to the idea of a duck. ☺ The concept of a
cursor is embedded in a WINDOW, it's embedded in the ranking functions! The idea
of OVER implies over a table just like you would declare a cursor 'over' a table.

ROW_NUMBER()
------------
 
Now lets take a ranking function from the sql query above:

ROW_NUMBER() OVER (ORDER BY customerID) AS RowNumber1

and come at it relationally using dataphor. First lets define a cursor using the
TRANK table ordered by CustomerID:

cursor(TRANK order by {CustomerID}) *

This encapsulates a specific order of the rows in a different type than a table, a 
cursor. After all, sql users know that a table by definition has no order so the
order that the table can't have is now in the cursor. Cursor, OVER (WINDOW) the same
thing? ☺ Now how can we get at the order of rows represented by the cursor? How can
we get discrete values of the cursors order? Well we can transform the cursor into a
list type. The order of each row in the cursor will be represented by the ordinal
position (an integer) the row has in the list. You explicitly convert the cursor
using the ToList operator.

ToList(cursor(TRANK order by {CustomerID}))

Kewl, but again we face the decision of just how we get at the list. We could
obviously procedurally (loop) thru it and access each row of the list. But what is
much better, we can let the system do it non procedurally. We can transform each row
of the list, which includes the integer value representing the position in the
list of each row, into a row of a table! Then we can simply use a SELECT query with
the transformed list. So lets transform the list to a table using the ToTable operator.

select ToTable(ToList(cursor(TRANK order by {CustomerID}))) 
                               order by {CustomerID,OrderID};
                               
OrderID CustomerID ShipCountry Freight sequence 
------- ---------- ----------- ------- -------- 
10259   CENTC      Mexico      $3.25   0        
10268   GROSR      Venezuela   $66.29  1        
10785   GROSR      Venezuela   $1.51   2        
10482   LAZYK      USA         $7.48   3        
10545   LAZYK      USA         $11.92  4        
10517   NORTS      UK          $32.07  5        
10752   NORTS      UK          $1.39   6        
11057   NORTS      UK          $4.13   7        
10738   SPECD      France      $2.91   8        
10907   SPECD      France      $9.19   9        
10964   SPECD      France      $87.38  10       
11043   SPECD      France      $8.80   11   

The rows represented in the list are now rows in a table. Additionally we get the
auto generated sequence column, the ordinal position each row had in the list 
defined by the sort order of the cursor. Renaming sequence to Index (to indicate
the column comes from an indexed list) and adding 1 to each Index value we have:

select ToTable(ToList(cursor(TRANK order by {CustomerID})),'A_row','Index') 
                                          redefine {Index:=Index+1}
                                            order by {CustomerID,OrderID};
                                           
OrderID CustomerID ShipCountry Freight Index 
------- ---------- ----------- ------- ----- 
10259   CENTC      Mexico      $3.25   1     
10268   GROSR      Venezuela   $66.29  2     
10785   GROSR      Venezuela   $1.51   3     
10482   LAZYK      USA         $7.48   4     
10545   LAZYK      USA         $11.92  5     
10517   NORTS      UK          $32.07  6     
10752   NORTS      UK          $1.39   7     
11057   NORTS      UK          $4.13   8     
10738   SPECD      France      $2.91   9     
10907   SPECD      France      $9.19   10    
10964   SPECD      France      $87.38  11    
11043   SPECD      France      $8.80   12    

This should look familiar, starting with a cursor over the table (TRANK) ordered by
CustomerID and then transformed to a list then transformed back to a table we arrive  
at the same values for Index created from: 

ROW_NUMBER() OVER (ORDER BY customerID) AS RowNumber1 (back to ranking functions query)

Conceptually the transformation between relational types is the red meat underlying
ROW_NUMBER (and all the other ranking functions). This is but another benefit of
working with and understanding types and their relationships in a relational system. 
(Note that there is no such thing as implicit conversion in dataphor as there is in 
 sql. Explicit vs. implicit conversion is a distinguishing characteristic between
 relational and sql systems.)
 
Note that cursors in Sql Server are viewed exclusively as an access method. The idea
of an sql cursor is synonymous with the idea of a procedural approach to data access as 
opposed to a set (non procedural) approach using a query. The type of action the cursor
implies is the primary focus while the representation of rows of a table in a particular
order is secondary. Relationally, the idea of representation of a table in an order is
a primary concept. It's the primary concept of the cursor type. It's within the context
of the relationship between types that the cursor is elevated in a relational system. 
(It's certainly the case that a relational cursor can be used to fetch rows sequentially
just like in sql (see here). But the importance of a cursor as an order representation
of table rows does not exist in sql as it does in a relational system. This hole in
the cursor concept in sql is based on the fundamental idea that sql is essentially a
typeless system compared to a relational one. What can one do with a cursor in sql 
except fetch from it, use it to retrieve data? On the other hand types are the
foundation of a relational system. And understanding a relational system comes thru
types.) 

RANK()
------

From the sql above featuring the ranking functions (go there) now lets look at the
expression:

RANK() OVER (ORDER BY CustomerID) AS Rank

Now, unlike ROW_NUMBER, there's no direct relational transformations we can make to
produce RANKs. But lets go back to the sql subquery methods for RowNumber and Rank
(go there). Here again is the result:


OrderID     CustomerID ShipCountry Freight RowNumber Rank
----------- ---------- ----------- ------- --------- ----
10259       CENTC      Mexico      3.25    1         1   <|-- Min RowNumber
10268       GROSR      Venezuela   66.29   2         2    <|-- Min RowNumber
10785       GROSR      Venezuela   1.51    3         2     |   
10482       LAZYK      USA         7.48    4         4     <|-- Min RowNumber
10545       LAZYK      USA         11.92   5         4      |
10517       NORTS      UK          32.07   6         6      <|-- Min RowNumber
10752       NORTS      UK          1.39    7         6       |
11057       NORTS      UK          4.13    8         6       |
10738       SPECD      France      2.91    9         9       <|-- Min RowNumber
10907       SPECD      France      9.19    10        9        |
10964       SPECD      France      87.38   11        9        |
11043       SPECD      France      8.80    12        9        | 

I previously introduced the idea that RANKs could easily be derived from ROW_NUMBERs.
The RANKs for CustomerID are the minimum ROWNUMBERs within CustomerID (review query here).
And we can use the same idea in dataphor. Here we show that using a group by with
CustomerID returns the Ranks:

select ToTable(ToList(cursor(TRANK {OrderID,CustomerID} order by {CustomerID})),'Arow','Index')
       group by {CustomerID} add{Min(Index) Rank} redefine {Rank:=Rank+1}

CustomerID Rank 
---------- ---- 
CENTC      1    
GROSR      2    
LAZYK      4    
NORTS      6    
SPECD      9  

Note that the cursor is over the TRANK table with only the columns OrderID and 
CustomerID. We include OrderID to guarantee that all CustomerIDs are in the table. 
Without OrderID (ie. only CustomerID) dataphor would only include unique values of 
CustomerID in the cursor (dataphor will never allow duplicate rows in a table).
In other words, we want all CustomerID values (or perhaps more correctly, all rows)
to be in the list, we don't want to eliminate duplicates.

And like in the sql query we can simply join the Ranks to the TRANK table and return
all the data.

select 
//Join TRANK table to table of CustomerID and Rank.
  TRANK
    join //Using a natural join on CustomerID.
     (
      //Use group by to get minimum Index for each CustomerID. 
      //Add 1 to Index to set minimum rank to 1. 
      ToTable(ToList(cursor(TRANK {OrderID,CustomerID} order by {CustomerID})),'Arow','Index')
       group by {CustomerID} add{Min(Index) Rank} redefine {Rank:=Rank+1}
     )
       order by {CustomerID,OrderID};

OrderID CustomerID ShipCountry Freight Rank 
------- ---------- ----------- ------- ---- 
10259   CENTC      Mexico      $3.25   1    
10268   GROSR      Venezuela   $66.29  2    
10785   GROSR      Venezuela   $1.51   2    
10482   LAZYK      USA         $7.48   4    
10545   LAZYK      USA         $11.92  4    
10517   NORTS      UK          $32.07  6    
10752   NORTS      UK          $1.39   6    
11057   NORTS      UK          $4.13   6    
10738   SPECD      France      $2.91   9    
10907   SPECD      France      $9.19   9    
10964   SPECD      France      $87.38  9    
11043   SPECD      France      $8.80   9

Like the sql RANK, it is an easy query with no row by row comparisons as in an sql
subquery. And it all starts with introducing an order of rows in a table thru a
cursor type.

(For those burning with curiosity go here for a procedural solution for the RANKs
in dataphor.)

DENSE_RANK()
------------

Now lets turn our attention to the dense rank expression in the sql query (go there):

DENSE_RANK() OVER (ORDER BY CustomerID) AS DenseRank

Dense ranks are consecutive integers that increment only for unique values of the
ORDER BY column(s). They remain the same for duplicate values. Therefore they're
based on distinct values. Before ranking functions it was common to write a subquery
using COUNT(DISTINCT column). We could get a dense rank on CustomerID with:

SELECT A.OrderID,A.CustomerID,A.ShipCountry,A.Freight,
(SELECT COUNT(DISTINCT B.CustomerID) FROM TRANK AS B WHERE B.CustomerID<=A.CustomerID) AS DenseRank
FROM TRANK AS A
ORDER BY CustomerID,OrderID

OrderID     CustomerID ShipCountry Freight DenseRank
----------- ---------- ----------- ------- ---------
10259       CENTC      Mexico      3.25    1
10268       GROSR      Venezuela   66.29   2
10785       GROSR      Venezuela   1.51    2
10482       LAZYK      USA         7.48    3
10545       LAZYK      USA         11.92   3
10517       NORTS      UK          32.07   4
10752       NORTS      UK          1.39    4
11057       NORTS      UK          4.13    4
10738       SPECD      France      2.91    5
10907       SPECD      France      9.19    5
10964       SPECD      France      87.38   5
11043       SPECD      France      8.80    5

How do we translate the sql idea of counting distinct values of CustomerID to
the context of relational types to get the dense rank? It's actually quite simple. 
First we create a table with only unique values of CustomerID. Then we transform 
the unique CustomerID rows into a list (with an ascending order). In the list the
ordinal position of each row (a row is just CustomerID column) is the dense rank! ☺ 
Then we can transform the list back to a table which includes the index of the list
as a column (DenseRank) in the table.

select ToTable(ToList(cursor(TRANK {CustomerID} order by {CustomerID})),'CustomerID','Index')
                {CustomerID,Index,Index+1 DenseRank};

CustomerID Index DenseRank 
---------- ----- --------- 
CENTC      0     1         
GROSR      1     2         
LAZYK      2     3         
NORTS      3     4         
SPECD      4     5         

We can join the above table with CustomerID and DenseRank to the full TRANK table
to return all the data.

select 
//Join TRANK table to table of CustomerID and DenseRank.
  TRANK
    join //Using a natural join on CustomerID.
     (
      //Get the dense ranks from the ordinal position that each CustomerID has in the list.
       ToTable(ToList(cursor(TRANK {CustomerID} order by {CustomerID})),'CustomerID','Index')
      //Add 1 to Index to set minimum DenseRank to 1. 
                     {CustomerID,Index+1 DenseRank}
     )
       order by {CustomerID,OrderID};

OrderID CustomerID ShipCountry Freight DenseRank 
------- ---------- ----------- ------- --------- 
10259   CENTC      Mexico      $3.25   1         
10268   GROSR      Venezuela   $66.29  2         
10785   GROSR      Venezuela   $1.51   2         
10482   LAZYK      USA         $7.48   3         
10545   LAZYK      USA         $11.92  3         
10517   NORTS      UK          $32.07  4         
10752   NORTS      UK          $1.39   4         
11057   NORTS      UK          $4.13   4         
10738   SPECD      France      $2.91   5         
10907   SPECD      France      $9.19   5         
10964   SPECD      France      $87.38  5         
11043   SPECD      France      $8.80   5     

Make sense? I bet you had no idea of what was behind the ranking functions. ☺
The ranking functions turn out to be a very special type of function. The sql
ranking functions may give a prelude as far as insight into them. I call this
prelude an overture (giving the keyword OVER in ranking functions its due). 
But relational ideas and specifically types take up where sql leaves off. ☺


Where DENSE_RANK fails
---------------------

Consider the following problem described here:

Identifying Sections 
By:Itzik Ben-Gan 
http://www.sqlmag.com/Article/ArticleID/95912/sql_server_95912.html
(You can download sql solutions.)

Given the following table:

CREATE TABLE dbo.T1
(
 id  INT NOT NULL PRIMARY KEY,
 val VARCHAR(10) NOT NULL
);

INSERT INTO dbo.T1(id, val) VALUES( 1, 'a');
INSERT INTO dbo.T1(id, val) VALUES( 2, 'a');
INSERT INTO dbo.T1(id, val) VALUES( 3, 'a');
INSERT INTO dbo.T1(id, val) VALUES( 5, 'a');
INSERT INTO dbo.T1(id, val) VALUES( 7, 'b');
INSERT INTO dbo.T1(id, val) VALUES( 9, 'b');
INSERT INTO dbo.T1(id, val) VALUES(11, 'a');
INSERT INTO dbo.T1(id, val) VALUES(13, 'a');
INSERT INTO dbo.T1(id, val) VALUES(17, 'b');
INSERT INTO dbo.T1(id, val) VALUES(19, 'b');
INSERT INTO dbo.T1(id, val) VALUES(23, 'b');
INSERT INTO dbo.T1(id, val) VALUES(29, 'a');
INSERT INTO dbo.T1(id, val) VALUES(31, 'b');
INSERT INTO dbo.T1(id, val) VALUES(37, 'b');

SELECT * FROM T1

id   val
---- ----
1    a
2    a
3    a
5    a
7    b
9    b
11   a
13   a
17   b
19   b
23   b
29   a
31   b
37   b

How do you get a dense rank for val in the order of id? The solution would look
like this:

id   val  DenseRank
---- ---- ---------
1    a    1
2    a    1
3    a    1
5    a    1         | change 1
7    b    2         |
9    b    2          | change 2
11   a    3          |
13   a    3           | change 3 
17   b    4           |
19   b    4
23   b    4            | change 4
29   a    5            |
31   b    6             | change 5
37   b    6             |


In this case the dense rank is certainly not based on distinct values of val. It's 
based on the number of changes between the values of val, ie. from 'a->b' or 'b->a'
in the direction of id (which is ascending). Over the ascending values of id there
are 5 changes between val values. The dense rank is incremented on every change
regardless of the prior and current values of val as opposed to the distinct values
of val (of which there are only 2). So the number of changes + 1 = number of distinct
dense rank values. Obviously we cannot use the DENSE_RANK function to encapsulate this
rank. The DENSE_RANK is for the ORDER BY column(s) (and PARTITION BY column(s) if
they exist). For example we can try various combinations of columns to sort by:

SELECT id ,val,
DENSE_RANK()OVER(ORDER BY val)    AS DenseRank1,
DENSE_RANK()OVER(ORDER BY val,id) AS DenseRank2
from t1
order by id

id   val  DenseRank1 DenseRank2
---- ---- ---------- ----------
1    a    1          1
2    a    1          2
3    a    1          3
5    a    1          4
7    b    2          8
9    b    2          9
11   a    1          5
13   a    1          6
17   b    2          10
19   b    2          11
23   b    2          12
29   a    1          7
31   b    2          13
37   b    2          14

From the point of view of a cursor we can't encapsulate this change, which is both
ascending and descending, for the ORDER BY column(s). And this is why a single 
DENSE_RANK function cannot be used to solve this kind of problem. The sql ranking
functions reflect only a single cursor. To capture the change multiple cursors
are required or multiple expressions in the case of sql. (You can download sql
solutions offered by Itzik Ben-Gan for the problem described above.)

For an example of a relational solution using multiple cursors, ToList and ToTable see:
Creating a Super Function
http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html
(This solution also emphasizes the relational idea of passing a table as a
 real parameter just like an integer or string.)

For another relational approach see:
Using a dense rank for identifying sections
http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html

For a solution exclusively in sql using the RAC utility see:
RAC - Rank This!
http://beyondsql.blogspot.com/2006/09/rac-rank-this.html

I hope you found this article interesting and I hope you'll be stimulated to explore
the relational world of dataphor. ☺


A procedural solution in dataphor for obtaining RANKs for CustomerID.

//Beginning of batch statements.
//Define a list of type string to hold scalar values of CustomerID.
var LList:=list(String){};
//Create a cursor variable ordering by just CustomerID.
var LCursor:=cursor(TRANK order by {CustomerID});
//Create a virtual table to hold the result, ie. table TRANK and a column for the rank (Rank)
//of each CustomerID.
var NewTable:=table of typeof(TRANK add {nil as Integer Rank}){};
var Dummy:Integer;
//Sequentially fetch the cursor rows by CustomerID.
while LCursor.Next() do
begin  
//LCursor.Select().CustomerID is the current scalar value of CustomerID (in the cursor).
//Add the CustomerID value to the list.
Dummy:=LList.Add(LCursor.Select().CustomerID); 
//LCursor.Select() is the current row in the cursor, all columns and their values.
//Insert the current row plus the rank into NewTable.
//The first index in the list for the current CustomerID is the Rank! 
//In other words, the 1st (minimum) ordinal position for a CustomerID+1=rank.
//The list operator IndexOf returns the minimum ordinal position for a list value. 
//In this case for a value of CustomerID. 
insert (LCursor.Select() add{LList.IndexOf(LCursor.Select().CustomerID)+1 Rank}) into NewTable;
end;
select NewTable order by {CustomerID,OrderID};
//End of batch statements.  
  

OrderID CustomerID ShipCountry Freight Rank 
------- ---------- ----------- ------- ---- 
10259   CENTC      Mexico      $3.25   1    
10268   GROSR      Venezuela   $66.29  2    
10785   GROSR      Venezuela   $1.51   2    
10482   LAZYK      USA         $7.48   4    
10545   LAZYK      USA         $11.92  4    
10517   NORTS      UK          $32.07  6    
10752   NORTS      UK          $1.39   6    
11057   NORTS      UK          $4.13   6    
10738   SPECD      France      $2.91   9    
10907   SPECD      France      $9.19   9    
10964   SPECD      France      $87.38  9    
11043   SPECD      France      $8.80   9 


* This is a valid expression in dataphor. It can be used in a query or as the
  definition of a variable, ie.

  var LCursor:=cursor(TRANK order by {CustomerID})

  where the LCursor variable inherits its type, cursor, from the expression.
Saturday, April 19, 2008

The Sql ranking OVERture

No comments:

Beyond Sql

Blog Archive

Links

About Me