Just about everyone who uses Sql Server 2005 likes the ranking functions. But like a lot of things in sql they are a mile long and an inch deep. You sort of know the syntax but when it comes to really understanding the concepts behind these functions things go south. One of the really interesting benefits of a relational system like Dataphor is that the underlying relational concepts can illuminate what's going in sql. This article looks at the sql ranking functions through the prism of relational types. Sql ranking functions can be explained and understood in terms of the basic relational types of cursor, table and list. I'm going to use a small sample of data from the Orders table from the Sq1 Server 2005 sample Northwind database: CREATE TABLE TRANK (OrderID int,CustomerID char(5),ShipCountry varchar(10),Freight money, primary key (OrderID)) GO INSERT INTO TRANK (OrderID,CustomerID,ShipCountry,Freight) SELECT OrderID,CustomerID,ShipCountry,Freight FROM Orders WHERE CustomerID in ('CENTC','GROSR','LAZYK','NORTS','SPECD') SELECT * FROM TRANK OrderID CustomerID ShipCountry Freight ----------- ---------- ----------- ------- 10259 CENTC Mexico 3.25 10268 GROSR Venezuela 66.29 10482 LAZYK USA 7.48 10517 NORTS UK 32.07 10545 LAZYK USA 11.92 10738 SPECD France 2.91 10752 NORTS UK 1.39 10785 GROSR Venezuela 1.51 10907 SPECD France 9.19 10964 SPECD France 87.38 11043 SPECD France 8.80 11057 NORTS UK 4.13 Now before electricity, I mean before Sql Server 2005 and ranking functions, you had to use a not very pretty subquery or join to get any kind of ranks/rownumbers. The outstanding issues with these queries are they could get complicated and often result in poor performance since they involve non-equal (>,<) comparisons involving NxN number of rows. For example, consider getting a row number and a rank for CustomerID. ( For more background see Visualizing a ranking query http://beyondsql.blogspot.com/2007/06/dataphor-sql-visualizing-ranking-query.html ) Using an sql subquery to get a row number and a rank: SELECT A.OrderID,A.CustomerID,A.ShipCountry,A.Freight, -- Klunky subquery with a count and non-equi predicates to get unique row numbers for -- CustomerID. -- Duplicate CustomerID values are resolved by using the unique values of OrderID. (SELECT COUNT(*) FROM TRANK AS B WHERE (B.CustomerID<A.CustomerID) OR ((B.CustomerID=A.CustomerID) and (B.OrderID<A.OrderID)))+1 as RowNumber, -- Another klunky subquery to get ranks for CustomerID. -- Important: note that Rank is the minimum RowNumber for each CustomerID. (SELECT COUNT(*) FROM TRANK AS B WHERE B.CustomerID<A.CustomerID)+1 as Rank FROM TRANK AS A ORDER BY CustomerID,OrderID OrderID CustomerID ShipCountry Freight RowNumber Rank ----------- ---------- ----------- ------- --------- ---- 10259 CENTC Mexico 3.25 1 1 <|-- Min RowNumber 10268 GROSR Venezuela 66.29 2 2 <|-- Min RowNumber 10785 GROSR Venezuela 1.51 3 2 | 10482 LAZYK USA 7.48 4 4 <|-- Min RowNumber 10545 LAZYK USA 11.92 5 4 | 10517 NORTS UK 32.07 6 6 <|-- Min RowNumber 10752 NORTS UK 1.39 7 6 | 11057 NORTS UK 4.13 8 6 | 10738 SPECD France 2.91 9 9 <|-- Min RowNumber 10907 SPECD France 9.19 10 9 | 10964 SPECD France 87.38 11 9 | 11043 SPECD France 8.80 12 9 | As the query shows there is a straightforward relationship between RowNumber and Rank. So even if you just wanted Rank, if you get the row numbers first, which will always be unique, then all you have to do is group by CustomerID and take the minimum row number value to get the Rank. If CustomerID and Rank were in a derived table it could easily be joined back to TRANK to get all the data. Something like this. Sql to derive ranks from row numbers: SELECT G.OrderID,G.CustomerID,G.ShipCountry,G.Freight,Rank FROM TRANK AS G INNER JOIN -- Get CustomerID and their rank from row numbers and join back to main table. (SELECT D.CustomerID,MIN(RowNumber) AS Rank FROM (SELECT A.CustomerID, (SELECT COUNT(*) FROM TRANK AS B WHERE (B.CustomerID<A.CustomerID) OR ((B.CustomerID=A.CustomerID) and (B.OrderID<A.OrderID)))+1 AS RowNumber FROM TRANK AS A) AS D GROUP BY D.CustomerID) AS F ON G.CustomerID=F.CustomerID ORDER BY G.CustomerID,G.OrderID OrderID CustomerID ShipCountry Freight Rank ----------- ---------- ----------- ------- ---- 10259 CENTC Mexico 3.25 1 10268 GROSR Venezuela 66.29 2 10785 GROSR Venezuela 1.51 2 10482 LAZYK USA 7.48 4 10545 LAZYK USA 11.92 4 10517 NORTS UK 32.07 6 10752 NORTS UK 1.39 6 11057 NORTS UK 4.13 6 10738 SPECD France 2.91 9 10907 SPECD France 9.19 9 10964 SPECD France 87.38 9 11043 SPECD France 8.80 9 But you're probably wondering why do something like this when you can use a subquery directly for Rank? Yeah, practically speaking it's silly but the idea of how row number and Rank are related like this is part of the puzzle of what lies behind the ranking functions in sql. Just because you haven't seen this in a textbook (or anyplace else) doesn't mean it's not relevant. ☺ Lets continue. Ok here is the data using the sql ranking functions. It's important to realize that ROW_NUMBER will always create unique values no matter how you sort the data. On the other hand RANK is sensitive to how you order the data. You'll get duplicate RANKs if there are duplicate values of the ORDER BY column(s). This is because RANK counts prior duplicate values (ie CustomerID) as if they were unique. And this can lead to RANK values skipping around. ☺ But again, bear in mind that Rank can be derived from row numbers as shown above. Example sql ranking functions query: SELECT OrderID,CustomerID,ShipCountry,Freight, -- ROW_NUMBER guarantees unique ranks regardless of whether or not the ORDER BY colum(s) -- are a primary key of the table. CustomerID,OrderID is a primary key, CustomerID -- is not. We assume that when ordering by just CustomerID that within duplicate -- CustomerIDs the ordering is non deterministic but always results in unique row -- number values. ROW_NUMBER() OVER (ORDER BY CustomerID) AS RowNumber1, ROW_NUMBER() OVER (ORDER BY CustomerID,OrderID) AS RowNumber2, -- RANK guarantees unique values only if the ORDER BY column(s) are a primary key, -- ie. if the ORDER BY column(s) value(s) target only unique rows then the ranks are -- unique. CustomerID,OrderID is a primary key of the table so the RANKs are unique. RANK() OVER (ORDER BY CustomerID,OrderID) AS UniqueRank, -- RANKing by a non primary key (CustomerID) generates duplicate values. RANK() OVER (ORDER BY CustomerID) AS Rank, -- DENSE RANK returns consecutive but repeating ranks for a non primary key (CustomerID). DENSE_RANK() OVER (ORDER BY CustomerID) AS DenseRank FROM TRANK ORDER BY CustomerID,OrderID OrderID CustomerID ShipCountry Freight RowNumber1 RowNumber2 UniqueRank Rank DenseRank ----------- ---------- ----------- ------- ---------- ---------- ---------- ---- --------- 10259 CENTC Mexico 3.25 1 1 1 1 1 10268 GROSR Venezuela 66.29 2 2 2 2 2 10785 GROSR Venezuela 1.51 3 3 3 2 2 10482 LAZYK USA 7.48 4 4 4 4 3 10545 LAZYK USA 11.92 5 5 5 4 3 10517 NORTS UK 32.07 8 6 6 6 4 10752 NORTS UK 1.39 6 7 7 6 4 11057 NORTS UK 4.13 7 8 8 6 4 10738 SPECD France 2.91 12 9 9 9 5 10907 SPECD France 9.19 9 10 10 9 5 10964 SPECD France 87.38 10 11 11 9 5 11043 SPECD France 8.80 11 12 12 9 5 The ranking functions overcome the two big obstacles of coding them as a subquery, they're easier and they perform much better because they don't involve the potentially terribly high number of comparisons involving the (>,<) predicates. Now I'm pretty sure you gathered how to use ranking functions from BOL. I'm also pretty sure you gained minimal insight about them from BOL. The documentation is a really botched job (perhaps I'll go into detail in another article. ☺ Suffice it to say, for example, that OVER is not a clause but a keyword indicating that what follows it is an OLAP (ie. ranking) function. What follows OVER is a clause, a WINDOW clause consisting of at least an order (ORDER BY) specification. You can take a step up from BOL by going to the sql standard from which ranking functions came to gleam a little more about them: ISO/ANSI: Introduction to OLAP functions (5/99) http://tinyurl.com/2taahc Bear in mind that given the ranking function: RANK() OVER (ORDER BY CustomerID) AS Rank The WINDOW is '(ORDER BY CustomerID)'. Now consider what the standard says about the ordering (ORDER BY): "Ordering in windows is specified with the same <sort specification list> used by cursors, and with the same semantics." "Whether in a cursor or a window, a <sort specification list> specifies an ordering of rows. The difference is that, in a cursor, the ordering determines the sequence to present rows during sequential fetches. In a window, the ordering helps to determine the value of <OLAP function>s." "The user can copy the <order specification list> from a cursor ORDER BY to a window ORDER BY, or the reverse, with precisely the same semantics in each context." If it quacks like a duck, swims like a duck and walks like a duck there's pretty good reason to believe it's at least related to the idea of a duck. ☺ The concept of a cursor is embedded in a WINDOW, it's embedded in the ranking functions! The idea of OVER implies over a table just like you would declare a cursor 'over' a table. ROW_NUMBER() ------------ Now lets take a ranking function from the sql query above: ROW_NUMBER() OVER (ORDER BY customerID) AS RowNumber1 and come at it relationally using dataphor. First lets define a cursor using the TRANK table ordered by CustomerID: cursor(TRANK order by {CustomerID}) * This encapsulates a specific order of the rows in a different type than a table, a cursor. After all, sql users know that a table by definition has no order so the order that the table can't have is now in the cursor. Cursor, OVER (WINDOW) the same thing? ☺ Now how can we get at the order of rows represented by the cursor? How can we get discrete values of the cursors order? Well we can transform the cursor into a list type. The order of each row in the cursor will be represented by the ordinal position (an integer) the row has in the list. You explicitly convert the cursor using the ToList operator. ToList(cursor(TRANK order by {CustomerID})) Kewl, but again we face the decision of just how we get at the list. We could obviously procedurally (loop) thru it and access each row of the list. But what is much better, we can let the system do it non procedurally. We can transform each row of the list, which includes the integer value representing the position in the list of each row, into a row of a table! Then we can simply use a SELECT query with the transformed list. So lets transform the list to a table using the ToTable operator. select ToTable(ToList(cursor(TRANK order by {CustomerID}))) order by {CustomerID,OrderID}; OrderID CustomerID ShipCountry Freight sequence ------- ---------- ----------- ------- -------- 10259 CENTC Mexico $3.25 0 10268 GROSR Venezuela $66.29 1 10785 GROSR Venezuela $1.51 2 10482 LAZYK USA $7.48 3 10545 LAZYK USA $11.92 4 10517 NORTS UK $32.07 5 10752 NORTS UK $1.39 6 11057 NORTS UK $4.13 7 10738 SPECD France $2.91 8 10907 SPECD France $9.19 9 10964 SPECD France $87.38 10 11043 SPECD France $8.80 11 The rows represented in the list are now rows in a table. Additionally we get the auto generated sequence column, the ordinal position each row had in the list defined by the sort order of the cursor. Renaming sequence to Index (to indicate the column comes from an indexed list) and adding 1 to each Index value we have: select ToTable(ToList(cursor(TRANK order by {CustomerID})),'A_row','Index') redefine {Index:=Index+1} order by {CustomerID,OrderID}; OrderID CustomerID ShipCountry Freight Index ------- ---------- ----------- ------- ----- 10259 CENTC Mexico $3.25 1 10268 GROSR Venezuela $66.29 2 10785 GROSR Venezuela $1.51 3 10482 LAZYK USA $7.48 4 10545 LAZYK USA $11.92 5 10517 NORTS UK $32.07 6 10752 NORTS UK $1.39 7 11057 NORTS UK $4.13 8 10738 SPECD France $2.91 9 10907 SPECD France $9.19 10 10964 SPECD France $87.38 11 11043 SPECD France $8.80 12 This should look familiar, starting with a cursor over the table (TRANK) ordered by CustomerID and then transformed to a list then transformed back to a table we arrive at the same values for Index created from: ROW_NUMBER() OVER (ORDER BY customerID) AS RowNumber1 (back to ranking functions query) Conceptually the transformation between relational types is the red meat underlying ROW_NUMBER (and all the other ranking functions). This is but another benefit of working with and understanding types and their relationships in a relational system. (Note that there is no such thing as implicit conversion in dataphor as there is in sql. Explicit vs. implicit conversion is a distinguishing characteristic between relational and sql systems.) Note that cursors in Sql Server are viewed exclusively as an access method. The idea of an sql cursor is synonymous with the idea of a procedural approach to data access as opposed to a set (non procedural) approach using a query. The type of action the cursor implies is the primary focus while the representation of rows of a table in a particular order is secondary. Relationally, the idea of representation of a table in an order is a primary concept. It's the primary concept of the cursor type. It's within the context of the relationship between types that the cursor is elevated in a relational system. (It's certainly the case that a relational cursor can be used to fetch rows sequentially just like in sql (see here). But the importance of a cursor as an order representation of table rows does not exist in sql as it does in a relational system. This hole in the cursor concept in sql is based on the fundamental idea that sql is essentially a typeless system compared to a relational one. What can one do with a cursor in sql except fetch from it, use it to retrieve data? On the other hand types are the foundation of a relational system. And understanding a relational system comes thru types.) RANK() ------ From the sql above featuring the ranking functions (go there) now lets look at the expression: RANK() OVER (ORDER BY CustomerID) AS Rank Now, unlike ROW_NUMBER, there's no direct relational transformations we can make to produce RANKs. But lets go back to the sql subquery methods for RowNumber and Rank (go there). Here again is the result: OrderID CustomerID ShipCountry Freight RowNumber Rank ----------- ---------- ----------- ------- --------- ---- 10259 CENTC Mexico 3.25 1 1 <|-- Min RowNumber 10268 GROSR Venezuela 66.29 2 2 <|-- Min RowNumber 10785 GROSR Venezuela 1.51 3 2 | 10482 LAZYK USA 7.48 4 4 <|-- Min RowNumber 10545 LAZYK USA 11.92 5 4 | 10517 NORTS UK 32.07 6 6 <|-- Min RowNumber 10752 NORTS UK 1.39 7 6 | 11057 NORTS UK 4.13 8 6 | 10738 SPECD France 2.91 9 9 <|-- Min RowNumber 10907 SPECD France 9.19 10 9 | 10964 SPECD France 87.38 11 9 | 11043 SPECD France 8.80 12 9 | I previously introduced the idea that RANKs could easily be derived from ROW_NUMBERs. The RANKs for CustomerID are the minimum ROWNUMBERs within CustomerID (review query here). And we can use the same idea in dataphor. Here we show that using a group by with CustomerID returns the Ranks: select ToTable(ToList(cursor(TRANK {OrderID,CustomerID} order by {CustomerID})),'Arow','Index') group by {CustomerID} add{Min(Index) Rank} redefine {Rank:=Rank+1} CustomerID Rank ---------- ---- CENTC 1 GROSR 2 LAZYK 4 NORTS 6 SPECD 9 Note that the cursor is over the TRANK table with only the columns OrderID and CustomerID. We include OrderID to guarantee that all CustomerIDs are in the table. Without OrderID (ie. only CustomerID) dataphor would only include unique values of CustomerID in the cursor (dataphor will never allow duplicate rows in a table). In other words, we want all CustomerID values (or perhaps more correctly, all rows) to be in the list, we don't want to eliminate duplicates. And like in the sql query we can simply join the Ranks to the TRANK table and return all the data. select //Join TRANK table to table of CustomerID and Rank. TRANK join //Using a natural join on CustomerID. ( //Use group by to get minimum Index for each CustomerID. //Add 1 to Index to set minimum rank to 1. ToTable(ToList(cursor(TRANK {OrderID,CustomerID} order by {CustomerID})),'Arow','Index') group by {CustomerID} add{Min(Index) Rank} redefine {Rank:=Rank+1} ) order by {CustomerID,OrderID}; OrderID CustomerID ShipCountry Freight Rank ------- ---------- ----------- ------- ---- 10259 CENTC Mexico $3.25 1 10268 GROSR Venezuela $66.29 2 10785 GROSR Venezuela $1.51 2 10482 LAZYK USA $7.48 4 10545 LAZYK USA $11.92 4 10517 NORTS UK $32.07 6 10752 NORTS UK $1.39 6 11057 NORTS UK $4.13 6 10738 SPECD France $2.91 9 10907 SPECD France $9.19 9 10964 SPECD France $87.38 9 11043 SPECD France $8.80 9 Like the sql RANK, it is an easy query with no row by row comparisons as in an sql subquery. And it all starts with introducing an order of rows in a table thru a cursor type. (For those burning with curiosity go here for a procedural solution for the RANKs in dataphor.) DENSE_RANK() ------------ Now lets turn our attention to the dense rank expression in the sql query (go there): DENSE_RANK() OVER (ORDER BY CustomerID) AS DenseRank Dense ranks are consecutive integers that increment only for unique values of the ORDER BY column(s). They remain the same for duplicate values. Therefore they're based on distinct values. Before ranking functions it was common to write a subquery using COUNT(DISTINCT column). We could get a dense rank on CustomerID with: SELECT A.OrderID,A.CustomerID,A.ShipCountry,A.Freight, (SELECT COUNT(DISTINCT B.CustomerID) FROM TRANK AS B WHERE B.CustomerID<=A.CustomerID) AS DenseRank FROM TRANK AS A ORDER BY CustomerID,OrderID OrderID CustomerID ShipCountry Freight DenseRank ----------- ---------- ----------- ------- --------- 10259 CENTC Mexico 3.25 1 10268 GROSR Venezuela 66.29 2 10785 GROSR Venezuela 1.51 2 10482 LAZYK USA 7.48 3 10545 LAZYK USA 11.92 3 10517 NORTS UK 32.07 4 10752 NORTS UK 1.39 4 11057 NORTS UK 4.13 4 10738 SPECD France 2.91 5 10907 SPECD France 9.19 5 10964 SPECD France 87.38 5 11043 SPECD France 8.80 5 How do we translate the sql idea of counting distinct values of CustomerID to the context of relational types to get the dense rank? It's actually quite simple. First we create a table with only unique values of CustomerID. Then we transform the unique CustomerID rows into a list (with an ascending order). In the list the ordinal position of each row (a row is just CustomerID column) is the dense rank! ☺ Then we can transform the list back to a table which includes the index of the list as a column (DenseRank) in the table. select ToTable(ToList(cursor(TRANK {CustomerID} order by {CustomerID})),'CustomerID','Index') {CustomerID,Index,Index+1 DenseRank}; CustomerID Index DenseRank ---------- ----- --------- CENTC 0 1 GROSR 1 2 LAZYK 2 3 NORTS 3 4 SPECD 4 5 We can join the above table with CustomerID and DenseRank to the full TRANK table to return all the data. select //Join TRANK table to table of CustomerID and DenseRank. TRANK join //Using a natural join on CustomerID. ( //Get the dense ranks from the ordinal position that each CustomerID has in the list. ToTable(ToList(cursor(TRANK {CustomerID} order by {CustomerID})),'CustomerID','Index') //Add 1 to Index to set minimum DenseRank to 1. {CustomerID,Index+1 DenseRank} ) order by {CustomerID,OrderID}; OrderID CustomerID ShipCountry Freight DenseRank ------- ---------- ----------- ------- --------- 10259 CENTC Mexico $3.25 1 10268 GROSR Venezuela $66.29 2 10785 GROSR Venezuela $1.51 2 10482 LAZYK USA $7.48 3 10545 LAZYK USA $11.92 3 10517 NORTS UK $32.07 4 10752 NORTS UK $1.39 4 11057 NORTS UK $4.13 4 10738 SPECD France $2.91 5 10907 SPECD France $9.19 5 10964 SPECD France $87.38 5 11043 SPECD France $8.80 5 Make sense? I bet you had no idea of what was behind the ranking functions. ☺ The ranking functions turn out to be a very special type of function. The sql ranking functions may give a prelude as far as insight into them. I call this prelude an overture (giving the keyword OVER in ranking functions its due). But relational ideas and specifically types take up where sql leaves off. ☺ Where DENSE_RANK fails --------------------- Consider the following problem described here: Identifying Sections By:Itzik Ben-Gan http://www.sqlmag.com/Article/ArticleID/95912/sql_server_95912.html (You can download sql solutions.) Given the following table: CREATE TABLE dbo.T1 ( id INT NOT NULL PRIMARY KEY, val VARCHAR(10) NOT NULL ); INSERT INTO dbo.T1(id, val) VALUES( 1, 'a'); INSERT INTO dbo.T1(id, val) VALUES( 2, 'a'); INSERT INTO dbo.T1(id, val) VALUES( 3, 'a'); INSERT INTO dbo.T1(id, val) VALUES( 5, 'a'); INSERT INTO dbo.T1(id, val) VALUES( 7, 'b'); INSERT INTO dbo.T1(id, val) VALUES( 9, 'b'); INSERT INTO dbo.T1(id, val) VALUES(11, 'a'); INSERT INTO dbo.T1(id, val) VALUES(13, 'a'); INSERT INTO dbo.T1(id, val) VALUES(17, 'b'); INSERT INTO dbo.T1(id, val) VALUES(19, 'b'); INSERT INTO dbo.T1(id, val) VALUES(23, 'b'); INSERT INTO dbo.T1(id, val) VALUES(29, 'a'); INSERT INTO dbo.T1(id, val) VALUES(31, 'b'); INSERT INTO dbo.T1(id, val) VALUES(37, 'b'); SELECT * FROM T1 id val ---- ---- 1 a 2 a 3 a 5 a 7 b 9 b 11 a 13 a 17 b 19 b 23 b 29 a 31 b 37 b How do you get a dense rank for val in the order of id? The solution would look like this: id val DenseRank ---- ---- --------- 1 a 1 2 a 1 3 a 1 5 a 1 | change 1 7 b 2 | 9 b 2 | change 2 11 a 3 | 13 a 3 | change 3 17 b 4 | 19 b 4 23 b 4 | change 4 29 a 5 | 31 b 6 | change 5 37 b 6 | In this case the dense rank is certainly not based on distinct values of val. It's based on the number of changes between the values of val, ie. from 'a->b' or 'b->a' in the direction of id (which is ascending). Over the ascending values of id there are 5 changes between val values. The dense rank is incremented on every change regardless of the prior and current values of val as opposed to the distinct values of val (of which there are only 2). So the number of changes + 1 = number of distinct dense rank values. Obviously we cannot use the DENSE_RANK function to encapsulate this rank. The DENSE_RANK is for the ORDER BY column(s) (and PARTITION BY column(s) if they exist). For example we can try various combinations of columns to sort by: SELECT id ,val, DENSE_RANK()OVER(ORDER BY val) AS DenseRank1, DENSE_RANK()OVER(ORDER BY val,id) AS DenseRank2 from t1 order by id id val DenseRank1 DenseRank2 ---- ---- ---------- ---------- 1 a 1 1 2 a 1 2 3 a 1 3 5 a 1 4 7 b 2 8 9 b 2 9 11 a 1 5 13 a 1 6 17 b 2 10 19 b 2 11 23 b 2 12 29 a 1 7 31 b 2 13 37 b 2 14 From the point of view of a cursor we can't encapsulate this change, which is both ascending and descending, for the ORDER BY column(s). And this is why a single DENSE_RANK function cannot be used to solve this kind of problem. The sql ranking functions reflect only a single cursor. To capture the change multiple cursors are required or multiple expressions in the case of sql. (You can download sql solutions offered by Itzik Ben-Gan for the problem described above.) For an example of a relational solution using multiple cursors, ToList and ToTable see: Creating a Super Function http://beyondsql.blogspot.com/2007/08/dataphor-creating-super-function.html (This solution also emphasizes the relational idea of passing a table as a real parameter just like an integer or string.) For another relational approach see: Using a dense rank for identifying sections http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html For a solution exclusively in sql using the RAC utility see: RAC - Rank This! http://beyondsql.blogspot.com/2006/09/rac-rank-this.html I hope you found this article interesting and I hope you'll be stimulated to explore the relational world of dataphor. ☺ A procedural solution in dataphor for obtaining RANKs for CustomerID. //Beginning of batch statements. //Define a list of type string to hold scalar values of CustomerID. var LList:=list(String){}; //Create a cursor variable ordering by just CustomerID. var LCursor:=cursor(TRANK order by {CustomerID}); //Create a virtual table to hold the result, ie. table TRANK and a column for the rank (Rank) //of each CustomerID. var NewTable:=table of typeof(TRANK add {nil as Integer Rank}){}; var Dummy:Integer; //Sequentially fetch the cursor rows by CustomerID. while LCursor.Next() do begin //LCursor.Select().CustomerID is the current scalar value of CustomerID (in the cursor). //Add the CustomerID value to the list. Dummy:=LList.Add(LCursor.Select().CustomerID); //LCursor.Select() is the current row in the cursor, all columns and their values. //Insert the current row plus the rank into NewTable. //The first index in the list for the current CustomerID is the Rank! //In other words, the 1st (minimum) ordinal position for a CustomerID+1=rank. //The list operator IndexOf returns the minimum ordinal position for a list value. //In this case for a value of CustomerID. insert (LCursor.Select() add{LList.IndexOf(LCursor.Select().CustomerID)+1 Rank}) into NewTable; end; select NewTable order by {CustomerID,OrderID}; //End of batch statements. OrderID CustomerID ShipCountry Freight Rank ------- ---------- ----------- ------- ---- 10259 CENTC Mexico $3.25 1 10268 GROSR Venezuela $66.29 2 10785 GROSR Venezuela $1.51 2 10482 LAZYK USA $7.48 4 10545 LAZYK USA $11.92 4 10517 NORTS UK $32.07 6 10752 NORTS UK $1.39 6 11057 NORTS UK $4.13 6 10738 SPECD France $2.91 9 10907 SPECD France $9.19 9 10964 SPECD France $87.38 9 11043 SPECD France $8.80 9 * This is a valid expression in dataphor. It can be used in a query or as the definition of a variable, ie. var LCursor:=cursor(TRANK order by {CustomerID}) where the LCursor variable inherits its type, cursor, from the expression.
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Saturday, April 19, 2008
The Sql ranking OVERture
Thursday, April 10, 2008
Dataphor - SET-ing your query in ORDER
Wanna have some fun, how would you like to build a list anywhere in a query? When you work with Dataphor you can think in terms of order in the mist of thinking in terms of sets. ☺ Here's how it works. When you Add an item to a list the operation returns an integer indicating the ordinal position the item holds in the list. Add is therefore like any other expression that returns a result (remember that Add adds one item after another in sequence). Well you can place this expression anywhere you could place any other expression. That's it! ☺ For example, using an integer list: var Item:=10; var LList1:=list(Integer){}; //An empty list of type integer. var Index:=Add(LList1,Item); //Index is of type integer, the returned type of the expression //that inserts the value of 10 into the list. //Note we can also use the expression: LList1.Add(Item). select Index; //Returns 0, the position of the 1st item in the list. select LList1[0];//Returns 10, the value of the 1st item in the list. Using a list with a row type: var LList1:=list(row{OrderID:Integer,CustomerID:String,Freight:Money}){}; var Item:=row{1 OrderID,'STEVE' CustomerID,$100.50 Freight}; var Index:=Add(LList1,Item); //Index holds the value 0, the position of the 1st row //added to the list. select Index; //Returns 0, the position of the 1st item (row) in the list. select LList1[0]; /* Returns the 1st item in the list, the row: OrderID CustomerID Freight ------- ---------- ------- 1 STEVE $100.50 */ Remember that the Add operation with list adds items sequentially to the list starting at position 0. OK, here's a contrived example query. It uses Sql Server 2005, the Northwind database and the Orders table. We're using three lists where a list item is a row of the Orders table limited to only three columns (one of which, OrderID, is the primary key of the table). In broad strokes we're looking at how relational expressions access data. The list, after all, reflects how the data was accessed, the sequential order in which rows from the table were processed depending on where we're at in the query. In this sense we're introducing two independent ideas in a single operation, thinking in terms of sets and thinking in terms of order (sequentially) that do not conflict with each other. ☺ The query first gets the 10 highest Freight values for all Freight values over $400. It then gets the lowest 5 freights from the 10 highest ones. var LList1:=list(typeof(Orders[]{OrderID,CustomerID,Freight})){}; var LList2:=list(typeof(Orders[]{OrderID,CustomerID,Freight})){}; var LList3:=list(typeof(Orders[]{OrderID,CustomerID,Freight})){}; //Note the "with {IgnoreUnsupported = 'true'}" in the query is used so you don't get a warning from the //dataphor compiler about sql server not supporting the list.Add operation. Obviously dataphor can't //hand off the list operation to sql server so the operation is under dataphors control. select ( ( Orders where Freight>$400 {OrderID,CustomerID,Freight} add{LList1.Add(Orders[OrderID]{OrderID,CustomerID,Freight} with {IgnoreUnsupported = 'true'}) R1} return 10 by {Freight desc} with {IgnoreUnsupported = 'true'} //return N is similar to sql server 'Select Top N..Order By'. ) add{LList2.Add(Orders[OrderID]{OrderID,CustomerID,Freight} with {IgnoreUnsupported = 'true'}) R2} return 5 by {Freight} ) add{LList3.Add(Orders[OrderID]{OrderID,CustomerID,Freight} with {IgnoreUnsupported = 'true'}) R3} order by {Freight}; /* Query result ordered by Freight. R1, R2 and R3 are the ordinal positions from the three lists created from the query. See the individual lists below for details. OrderID CustomerID Freight R1 R2 R3 ------- ---------- ------- -- -- -- 11032 WHITC $606.19 19 9 0 10983 SAVEA $657.54 16 8 1 10479 RATTC $708.95 2 7 2 10816 GREAL $719.78 9 6 3 11017 ERNSH $754.26 17 5 4 */ Using ToTable we can transform a list to a table where the ordinal position of the item (ie. row) in the list is converted to a value. Of course we could access the list with a cursor/loop and use an indexer to get each individual item (ie. LList1[0]). Using ToTable is more straighforward for our purposes. And remember list->in terms of position, table->in terms of column(s) value. ☺ ToTable automatically creates a column named sequence indicating the ordinal position (of the row). We're renaming it to Index. select ToTable(LList1) rename {sequence Index} order by {Index}; /* List1 result ordered by the ordinal position (Index) of each row. LList1 indicates rows in the query are added to the list in the order of the primary key (OrderID) values. OrderID CustomerID Freight Index ------- ---------- --------- ----- 10372 QUEEN $890.78 0 10430 ERNSH $458.78 1 10479 RATTC $708.95 2 10514 ERNSH $789.95 3 10540 QUICK $1,007.64 4 10612 SAVEA $544.08 5 10633 ERNSH $477.90 6 10634 FOLIG $487.38 7 10691 QUICK $810.05 8 10816 GREAL $719.78 9 10836 ERNSH $411.88 10 10841 SUPRD $424.30 11 10847 SAVEA $487.57 12 10897 HUNGO $603.54 13 10912 HUNGO $580.91 14 10941 SAVEA $400.81 15 10983 SAVEA $657.54 16 11017 ERNSH $754.26 17 11030 SAVEA $830.75 18 11032 WHITC $606.19 19 */ select ToTable(LList2) rename {sequence Index} order by {Index}; /* List2 result ordered by the ordinal position (Index) of each row. LList2 indicates rows are added to the list in the order of descending Freight values. OrderID CustomerID Freight Index ------- ---------- --------- ----- 10540 QUICK $1,007.64 0 10372 QUEEN $890.78 1 11030 SAVEA $830.75 2 10691 QUICK $810.05 3 10514 ERNSH $789.95 4 11017 ERNSH $754.26 5 10816 GREAL $719.78 6 10479 RATTC $708.95 7 10983 SAVEA $657.54 8 11032 WHITC $606.19 9 */ select ToTable(LList3) rename {sequence Index} order by {Index}; /* List3 result ordered by the ordinal position (Index) of each row. Rows are added to the list in the order of ascending Freight values. The order of the list agrees with the result of the query. OrderID CustomerID Freight Index ------- ---------- ------- ----- 11032 WHITC $606.19 0 10983 SAVEA $657.54 1 10479 RATTC $708.95 2 10816 GREAL $719.78 3 11017 ERNSH $754.26 4 */ You can take this idea and really run with it. ☺ Use it to help explain how a query builds an expression, how indexes actually influence table access to all kinds of list processing driven by a query. When you combine the ability to build a list with other list operations like IndexOf, IndexOfAny and LastIndexOf you can do some kewl things. Actually this simple introduction is just the tip of the iceberg when it comes to combining these two types. Oh yeah, note that you can conditionally build a list(s) within a query using a case (or if-then-else) statement. For example: var LList1:=list(typeof(Orders[]{OrderID,CustomerID,Freight})){}; select Orders where Freight>$700 add //Build a list based on a further restriction on freight. { case when Freight>$800 then LList1.Add(Orders[OrderID]{OrderID,CustomerID,Freight} with {IgnoreUnsupported = 'true'}) else -1 end R1 } {OrderID,CustomerID,Freight,R1} order by {R1,OrderID}; /* Query result. OrderID CustomerID Freight R1 ------- ---------- --------- -- 10479 RATTC $708.95 -1 10514 ERNSH $789.95 -1 10816 GREAL $719.78 -1 11017 ERNSH $754.26 -1 10372 QUEEN $890.78 0 10540 QUICK $1,007.64 1 10691 QUICK $810.05 2 11030 SAVEA $830.75 3 */ select ToTable(LList1) rename {sequence Index} order by {Index}; /* OrderID CustomerID Freight Index ------- ---------- --------- ----- 10372 QUEEN $890.78 0 10540 QUICK $1,007.64 1 10691 QUICK $810.05 2 11030 SAVEA $830.75 3 */ Note that other operations with the list type like insert and remove don't return a value so you can't use them in a query (yet). ☺ More to come on lists.
Tuesday, April 01, 2008
Dataphor - Get a copy of Dataphor
Database Consulting Group has acquired Alphora and Dataphor: www.dataphor.org You can get the latest full evaluation (commerical) version of Dataphor here: http://databaseconsultinggroup.com/downloads/ Questions on Dataphor can be asked here: http://www.databaseconsultinggroup.com/forums/ For information about the open source version of Dataphor see: www.dataphor.org
Subscribe to:
Posts (Atom)