Sql Server 2005 Ranking functions vs. Rac Ranking Principal players: Rac v2.33 running on (more info on Rac here) MS Sql Server 2005 sp1 In general any abstraction for t-sql is good abstraction. And the Sql-99 analytic ranking functions are indeed a welcome addition to Sql Server 2005. There has been universal approval and all kinds of favor has be accorded them from the Sql Server community. But the history of sql is the history of the gang that can't shoot straight. And the gang tends to always be a day late and a dollar short. So did sql and MS show itself to be all it can be with the new ranking functionality? Is the Sql Server model superior or inferior to well..the 'RAC' model of ranking. Two logically different approaches. Think of this as Fox Sql, fair and balanced..you decide :) Here is the general specification of a rank construct from Bol: RANK_TYPE ( ) OVER ( [ < partition_by_clause > ] < order_by_clause > ) Excuse me if I ask just who thinks this stuff up? Perhaps they should get off what their on or get back on what their off. It's painful for those not steeped in sql speak and clearly doesn't capture the logical/conceptual meaning of the construct. How many of you first looked at this spec and wondered just what is the basis for the rank in the first place? So allow me to express it based on common sense: RANK_TYPE (For Column(s)/[ASC | DESC]) OVER TABLE <Optional:[PARTITION BY Column(s)]> Res ipse loquitor - Now the thing speaks for itself. The rank is based on a column. To be fair this is what the sql standard meant when they expressed the idea of the 'utility' of a construct to express the ageless sql of a rank in a subquery: (select count(*) from t1 where t1.column<=t2.column) as Rank The meaning of the ordering of the column and whether or not the rank is obtained within groups/partitions or over the entire table is clear. Now here is the concept of a rank in Rac expressed in a similar way: RANK_TYPE (For PARTITION Columns) OVER TABLE Optional ORDER BY Columns Res ipse loquitor - Now the thing speaks for itself. Here the object of a rank is a partition! The sort order of the partition are the column(s) of the partition but an optional sort order can be defined with columns independent of the partition column(s). This is a logically different concept than the sql rank. That an sql rank thru a column(s) cannot encapsulate a rank predicated on partition(s) or a structure if you will is because there is no guarantee that a column(s) exists to capture or enumerate such a rank. In other words a single subquery, the sql model, may not be adaquate. The underlying sql code necessary to capture the rank may take the form of nested subqueries, certainly more complicated than the sql rank single subquery. The underlying Rac model for capturing ranks over partitions can be seem here where I first proposed creating 'virtual groups': microsoft.public.sqlserver.programming Mon, Jul 5 1999 'Difficult SQL Question' Name: 'Trysql' (but signed by yours truely -:) The thread can be seen here. Of course a partition can be conceptually decomposed to simple column(s) so the sql rank is encapsulated within the logical model of the Rac rank. Here is an example that elucidates the two different ranking models. (The data is extrapolated from an example of a Rac solution which can be seen here). create table ranktest (ID tinyint primary key,ROWID tinyint,TYPE tinyint,POS tinyint,STRING char(1)) insert ranktest values(1,1,1,1,'1') insert ranktest values(2,1,1,2,'2') insert ranktest values(3,1,2,3,'-') insert ranktest values(4,1,2,4,'%') insert ranktest values(5,1,2,5,'=') insert ranktest values(6,1,2,6,'B ') insert ranktest values(7,1,1,7,'3') insert ranktest values(8,1,1,8,'1') insert ranktest values(9,1,1,9,'2') insert ranktest values(10,1,2,10,'A') insert ranktest values(11,2,2,1,'D') insert ranktest values(12,2,2,2,'C') insert ranktest values(13,2,2,3,'M') insert ranktest values(14,2,1,4,'6') insert ranktest values(15,2,1,5,'5') insert ranktest values(16,2,2,6,'T') insert ranktest values(17,2,2,7,'Y') insert ranktest values(18,2,2,8,'W') insert ranktest values(19,2,2,9,'R') insert ranktest values(20,2,1,10,'6') insert ranktest values(21,2,1,11,'7') select * from ranktest ID ROWID TYPE POS STRING ---- ----- ---- ---- ------ 1 1 1 1 1 2 1 1 2 2 3 1 2 3 - 4 1 2 4 % 5 1 2 5 = 6 1 2 6 B 7 1 1 7 3 8 1 1 8 1 9 1 1 9 2 10 1 2 10 A 11 2 2 1 D 12 2 2 2 C 13 2 2 3 M 14 2 1 4 6 15 2 1 5 5 16 2 2 6 T 17 2 2 7 Y 18 2 2 8 W 19 2 2 9 R 20 2 1 10 6 21 2 1 11 7 We want a dense rank for the partition ROWID,TYPE. Although ID and POS are redundant they imply a meaningfulness to the order of rows both over the table and of TYPE within ROWID. The Rac model can be represented conceptually as: DENSE_RANK (For PARTITION {ROWID,TYPE}) OVER TABLE (ORDER BY ID) or DENSE_RANK (For PARTITION {ROWID,TYPE}) OVER TABLE (ORDER BY ROWID,POS) Rac does not offer the autonomy of ranking specifications as in sql. Partitions are defined by the ordinal position of columns in @rows. In the absence of optional sorting of the rows using @rowsort, the partition is ordered by the @rows specification from left to right. With @rowsort the partition is orderd by the @rowsort column(s). The Rac ranking parameter @rowindicators returns a dense rank. The key parameters to obtain the dense rank are: @rows='ROWID & TYPE & POS & STRING', @rowsort='ID' or @rowsort='ROWID & POS' @rowindicators='TYPE{DENSE_RANK}' The @rows specification implies the needed partition. The 'TYPE' in the @rowindicators argument 'TYPE{DENSE_RANK}' is the end point of the partition to obtain the dense rank for. In other words, all partitions as defined in @rows will be included for this dense rank up to and including TYPE. Therefore 'TYPE' implies the inclusion 'ROWID' and the partition of 'ROWID,TYPE' will be used for the rank. Any number of ranks can be defined in a Rac execute but they will all be based on the implied partition in @rows from left to right. A rank based on row_number() is included via @rowcounters and uses the partition 'ROWID,TYPE'. The name of the dense rank is within the tides, ie. '{DENSE_RANK}'. The name of the row_number rank is 'ROW NUMBER'. Rac executes: Exec Rac @transform='_dummy_', @rows='ROWID & TYPE & POS & STRING', -- Order the partitions (@rows). @rowsort='ID', @pvtcol='Report Mode',@defaultexceptions='dumy', @from='ranktest',@defaults1='y',@rowbreak='n',@racheck='y', @counterdatatype='tinyint', -- DENSE_RANK() using PARTITION BY ROWID,TYPE ORDERED BY ID @rowindicators='TYPE{DENSE_RANK}', -- ROW_NUMBER() using PARTITION BY ROWID,TYPE ORDERED BY ID @rowcounters='TYPE{ROW_NUMBER}' OR Exec Rac @transform='_dummy_', @rows='ROWID & TYPE & POS & STRING', -- Order the partitions (@rows). @rowsort='ROWID & POS' @pvtcol='Report Mode',@defaultexceptions='dumy', @from='ranktest',@defaults1='y',@rowbreak='n',@racheck='y', @counterdatatype='tinyint', @rowindicators='TYPE{DENSE_RANK}', @rowcounters='TYPE{ROW_NUMBER}' Result ROWID TYPE POS STRING DENSE_RANK ROW_NUMBER ------ ---- ---- ------ ---------- ---------- 1 1 1 1 1 1 1 1 2 2 1 2 1 2 3 - 2 1 1 2 4 % 2 2 1 2 5 = 2 3 1 2 6 B 2 4 1 1 7 3 3 1 1 1 8 1 3 2 1 1 9 2 3 3 1 2 10 A 4 1 2 2 1 D 1 1 2 2 2 C 1 2 2 2 3 M 1 3 2 1 4 6 2 1 2 1 5 5 2 2 2 2 6 T 3 1 2 2 7 Y 3 2 2 2 8 W 3 3 2 2 9 R 3 4 2 1 10 6 4 1 2 1 11 7 4 2 With the sql rank the question becomes is there a column(s) that encapsulates the bi-directional nature of TYPE within ROWID. Sadly there is not. They only column appropriate for the sql rank is the DENSE_RANK itself! In other words, the sql rank cannot be stated: DENSE_RANK() OVER (PARTITION BY ROWID,TYPE ORDER BY ?) as DENSE_RANK or is redundant and a non-solution: DENSE_RANK() OVER (PARTITION BY ROWID ORDER BY TYPE) as DENSE_RANK Any of the sql ranking functions, not matter how they are forumulated, will not reflect the meaning of the data. This is especially seen by comparing the sql ROW_NUMBER() with the one from Rac : select ID,ROWID,TYPE,POS,STRING, ROW_NUMBER() OVER (PARTITION BY ROWID order by TYPE) as [TYPE ROW_NUMBER], ROW_NUMBER() OVER (PARTITION BY ROWID,TYPE order by POS) as [POS ROW_NUMBER] from ranktest order by ID ID ROWID TYPE POS STRING TYPE ROW_NUMBER POS ROW_NUMBER RAC ROW_NUMBER ---- ----- ---- ---- ------ --------------- -------------- -------------- 1 1 1 1 1 1 1 1 2 1 1 2 2 2 2 2 3 1 2 3 - 7 1 1 4 1 2 4 % 8 2 2 5 1 2 5 = 9 3 3 6 1 2 6 B 10 4 4 7 1 1 7 3 3 3 1 8 1 1 8 1 4 4 2 9 1 1 9 2 5 5 3 10 1 2 10 A 6 5 1 11 2 2 1 D 9 1 1 12 2 2 2 C 10 2 2 13 2 2 3 M 11 3 3 14 2 1 4 6 1 1 1 15 2 1 5 5 2 2 2 16 2 2 6 T 5 4 1 17 2 2 7 Y 6 5 2 18 2 2 8 W 7 6 3 19 2 2 9 R 8 7 4 20 2 1 10 6 3 3 1 21 2 1 11 7 4 4 2 Again the ranks do not reflect the meaning of the data when the rank is dependent on a column instead of a logical partition. While the sql rank is indeed a welcome abstraction they could have done a much better job. By using such a low level of abstraction many clear and easy solutions to non-trivial problems remain lost to most of the user community. This is unfortunate. While there is a certain quality in quantity sql never seems to truely show that the sum is greater than the parts. And for those who build database systems as well as those that use them and who frown on abstraction, the burden is on them to tell the rest of us why. Less can be very much more. Obtaining sql ranks with Rac. This is intended to show how Rac obtains equivalent sql ranks. This example use the adventureworks query in Bol shown under 'Ranking Functions (Transact-SQL)' ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm An additional rank for ROW_NUMBER within PostalCode is added. SELECT c.FirstName, c.LastName ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number' --Rank for LastName,FirstName Partitioned by Postcode ,ROW_NUMBER() OVER (PARTITION BY a.PostalCode ORDER BY c.LastName,c.FirstName) AS 'PART. Row Number' ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank' ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank' ,NTILE(4) OVER (ORDER BY a.PostalCode) AS 'Quartile' ,s.SalesYTD, a.PostalCode FROM Sales.SalesPerson s INNER JOIN Person.Contact c ON s.SalesPersonID = c.ContactID INNER JOIN Person.Address a ON a.AddressID = c.ContactID WHERE TerritoryID IS NOT NULL AND SalesYTD <> 0; FirstName LastName Row Number PART. Row Number Rank Dense Rank SalesYTD PostalCode ---------- ---------- ---------- ---------------- ---- ---------- ----------- --------------- Maciej Dusza 1 1 1 1 4557045.05 98027 Shelley Dyck 2 2 1 1 5200475.23 98027 Linda Ecoffey 3 3 1 1 3857163.63 98027 Carla Eldridge 4 4 1 1 1764938.99 98027 Carol Elliott 5 5 1 1 2811012.72 98027 Jauna Elson 6 1 6 2 3018725.49 98055 Michael Emanuel 7 2 6 2 3189356.25 98055 Terry Eminhizer 8 3 6 2 3587378.43 98055 Gail Erickson 9 4 6 2 5015682.38 98055 Mark Erickson 10 5 6 2 3827950.24 98055 Martha Espinoza 11 6 6 2 1931620.18 98055 Janeth Esteves 12 7 6 2 2241204.04 98055 Twanna Evans 13 8 6 2 1758385.93 98055 Sql Rank Rac Rank ROW_NUMBER() OVER (ORDER BY a.PostalCode) Rac by default computes a ROW_NUMBER over the table in the sort order implied by @rows or explicitly specified in @rowsort. It is referenced by @tablecounter if there is no @select or by rd in a select statement. RANK() OVER (ORDER BY a.PostalCode) @rowruns='dumy^rank() within tablecounter^(dumy){PostalCode}' Rac does not have a default for RANK() as accounting for ties is seldomed used or needed. But it can be obtained easily using a form of the Rac running sums parameter @rowruns. DENSE_RANK() OVER (ORDER BY a.PostalCode) PostalCode{[Dense Rank]} A trivial case of a column (sql) and a partition (Rac) having the same semantics. NTILE(4) OVER (ORDER BY a.PostalCode) Rac has no direct equivalent expression. It can be computed using a combination of Rac options but it seems seldom called for. The sql rank: ROW_NUMBER() OVER (PARTITION BY a.PostalCode ORDER BY c.LastName,c.FirstName) is equivalant to the Rac @rowcounters='PostalCode{[PART. Row Number]}'. The implied sort order in @rows, PostalCode,LastName,FirstName, gives the Rac @rowcounters specification the same semantics as this ROW_NUMBER. Regardless of the order columns are specified in @rows, the data can be presented/manipulated at will via the @select parameter. Here is the Rac version which returns the same result as the adventureworks query. Exec Rac @transform='_dummy_', @rows='a.PostalCode & c.LastName & c.FirstName & s.SalesYTD', @pvtcol='Report Mode', @from='Sales.SalesPerson s INNER JOIN Person.Contact c ON s.SalesPersonID = c.ContactID INNER JOIN Person.Address a ON a.AddressID = c.ContactID', @where='TerritoryID IS NOT NULL AND SalesYTD <> 0', @rowbreak='n',@defaults1='y',@racheck='y', @tablecounter='y', @rowindicators='PostalCode{[Dense Rank]}', @rowcounters='PostalCode{[PART. Row Number]}', @rowruns='dumy^rank() within tablecounter^(dumy){PostalCode}', @select='select FirstName,LastName,rd as [Row Number],[PART. Row Number], 1*runs as RANK,[Dense Rank],SalesYTD,PostalCode from rac order by rd'
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Tuesday, September 05, 2006
RAC - Rank This!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment