The use of prior information is a fundamental basis for judging the efficiency of a system. Systems that make use of prior information are efficient systems. The most efficient systems make the best use of this information. The most inefficient systems are those that totally ignore it. This holds for small things like functions to large things like databases and application development. Many times efficiency or the lack thereof is hidden. If you could see a visualization of what is happening it should certainly help in understanding it. A classic query in sql is a query that ranks a column. A count is used to represent the rank of each column value over the table. This type of query is discussed at length regarding MS Sql Server in: Itzik Ben-Gan and Sujata Mehta OVER Clause and Ordered Calculations http://www.insidetsql.com/OVER_Clause_and_Ordered_Calculations.doc Create a table R1 in Dataphor using MS Sql Server 2005 as the data repository. create table R1 { ID:Integer, Y:Integer, key{ID} }; insert table { row{10 ID,8 Y}, row{15,26}, row{20,16}, row{25,12}, row{30,25}, row{35,42}, row{40,19}, row{45,36}, row{50,22}, row{55,39} } into R1; select R1: ID Y -- -- 10 8 15 26 20 16 25 12 30 25 35 42 40 19 45 36 50 22 55 39 A query to rank column Y has traditionally been written in two forms. The idea is to use the aggregate Count to count the number of times a comparison between a particular value of Y and all the other row values of Y is true. The comparison takes a specific row of R1 and compares the column Y <= to all the column Y values from all the rows. The count of comparisons where (a value of Y <= all other Y values) is true is the rank of the Y value. The query can be written using a subquery to obtain a count or with a join followed by a group by. An sql query using a subquery: select A.ID,A.Y, (select count(*) from R1 as B where B.Y<=A.Y) as Rank from R1 as A order by A.Y An sql query using a join: select A.ID,A.Y,Count(*) as Rank from R1 as A join R1 as B on B.Y<=A.Y group by A.ID,A.Y order by A.Y Both queries result in: ID Y Rank -- -- ---- 10 8 1 25 12 2 20 16 3 40 19 4 50 22 5 30 25 6 15 26 7 45 36 8 55 39 9 35 42 10 But both queries can hide the nature of the inequality comparison 'B.Y<=A.Y' on which the count is based. This comparison requires every row of R1 for every value of Y. In other words each count for a value of Y requires the table R1. Therefore there are N values of Y multiplied by the rows of the table, N. There N*N comparisons for N counts where N is the number of rows in the table. For table R1 it is therefore necessary for the database engine to make 100 comparisons to obtain the 10 counts and rank each column Y. To more clearly see the queries in terms of the work involved lets look at representing the work of the database engine via a Dataphor (D4) query. A table can be declared in any D4 query using the 'table' construct. Three random rows of table R1 were chosen so as to represent three random values of column Y. The query uses values of Y {39,16,12} to obtain ranks for only these values. Each table expression is the table R1 and additionally includes one of the random Y values in the list, its comparison to each of the Y column values in R1 and the boolean (true/false) result of the comparison. We use a left join since we are not accounting for all ranks (Y values). select R1 left join ( ( table { row{1 RowID,10 ID,8 Y,39 Y1,8<=39 Test,true TestY1}, row{1 RowID,15 ID,26 Y,39 Y1,26<=39 Test,true TestY1}, row{1 RowID,20 ID,16 Y,39 Y1,16<=39 Test,true TestY1}, row{1 RowID,25 ID,12 Y,39 Y1,12<=39 Test,true TestY1}, row{1 RowID,30 ID,25 Y,39 Y1,25<=39 Test,true TestY1}, row{1 RowID,35 ID,42 Y,39 Y1,42<=39 Test,false TestY1}, row{1 RowID,40 ID,19 Y,39 Y1,19<=39 Test,true TestY1}, row{1 RowID,45 ID,36 Y,39 Y1,36<=39 Test,true TestY1}, row{1 RowID,50 ID,22 Y,39 Y1,22<=39 Test,true TestY1}, row{1 RowID,55 ID,39 Y,39 Y1,39<=39 Test,true TestY1} } where TestY1 group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) union ( table { row{2 RowID,10 ID,8 Y,16 Y1,8<=16 Test,true TestY1}, row{2 RowID,15 ID,26 Y,16 Y1,26<=16 Test,false TestY1}, row{2 RowID,20 ID,16 Y,16 Y1,16<=16 Test,true TestY1}, row{2 RowID,25 ID,12 Y,16 Y1,12<=16 Test,true TestY1}, row{2 RowID,30 ID,25 Y,16 Y1,25<=16 Test,false TestY1}, row{2 RowID,35 ID,42 Y,16 Y1,42<=16 Test,false TestY1}, row{2 RowID,40 ID,19 Y,16 Y1,19<=16 Test,false TestY1}, row{2 RowID,45 ID,36 Y,16 Y1,36<=16 Test,false TestY1}, row{2 RowID,50 ID,22 Y,16 Y1,22<=16 Test,false TestY1}, row{2 RowID,55 ID,39 Y,16 Y1,39<=16 Test,false TestY1} } where TestY1 group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) union ( table { row{3 RowID,10 ID,8 Y,12 Y1,8<=12 Test,true TestY1}, row{3 RowID,15 ID,26 Y,12 Y1,26<=12 Test,false TestY1}, row{3 RowID,20 ID,16 Y,12 Y1,16<=12 Test,false TestY1}, row{3 RowID,25 ID,12 Y,12 Y1,12<=12 Test,true TestY1}, row{3 RowID,30 ID,25 Y,12 Y1,25<=12 Test,false TestY1}, row{3 RowID,35 ID,42 Y,12 Y1,42<=12 Test,false TestY1}, row{3 RowID,40 ID,19 Y,12 Y1,19<=12 Test,false TestY1}, row{3 RowID,45 ID,36 Y,12 Y1,36<=12 Test,false TestY1}, row{3 RowID,50 ID,22 Y,12 Y1,22<=12 Test,false TestY1}, row{3 RowID,55 ID,39 Y,12 Y1,39<=12 Test,false TestY1} } where TestY1 group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) ) order by {Y}; ID Y Rank -- -- ---------- 10 8 <No Value> 25 12 2 20 16 3 40 19 <No Value> 50 22 <No Value> 30 25 <No Value> 15 26 <No Value> 45 36 <No Value> 55 39 9 35 42 <No Value> Each expression: ( table{ } where TestY1 //The number of rows where TestY1 is true is the Rank. group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) represents the work necessary to get a rank for each Y value. Not only is it necessary to make the comparisons within the table { } but the result of the comparison must be related back to table R1. This idea is first conveyed in the where and group by so as to reduce the count to a single row and then in the union of all rows of ranks which is then joined by Y values back to R1. Visualizing the query this way makes it easier to see that the computation of any one rank is independent of any other rank. There is not any information used in one rank that is used for the computation of another rank. In other words, there is no use of prior information from one rank to another. Conceptually this is why this type of query has be referred to as an inefficient type of query. Note that the implied basic structure of ranking queries, a table repeated for each rank desired, does not change regardless of the type of rank desired. What changes is the complexity of the comparison. Consider the following table: create table A1 { ID:Integer, Y:Integer, key{ID} } ; insert table { row{1 ID,5 Y}, row{2,7}, row{3,7}, row{4,10}, row{5,20}, row{6,20}, row{7,32} } into A1; To obtain unique ranks for Y the comparison 'each value of Y<=a particular value of Y' is inappropriate since it results in non-unique ranks. To break tied ranks the following D4 query can be used: select (A1 rename {ID ID1,Y Y1}) add {Count(A1 where ((Y<Y1) or (Y=Y1 and ID<=ID1)) ) Rank} {ID1 ID,Y1 Y,Rank} order by {Rank} ; ID Y Rank -- -- ---- 1 5 1 2 7 2 3 7 3 4 10 4 5 20 5 6 20 6 7 32 7 This query can be visualized with the same table structure as the prior example but the complexity of the comparison is increased due to bringing in the ID column. The is another negative impact to efficiency. In all the queries presented so far the only logical order is how the result will be presented (order by). The queries operate under the assumption that the database engine will pick a particular row to obtain a rank in any way it chooses. As the queries are written we don't care nor does it have any importance how the rows (ranks) are processed. Lets now introduce the idea of ordering the columns of Y by sorting the rows of table R1 in ascending order of column Y and processing the table in that order. The D4 ToList operator creates a list of rows from a table defined in the order of a cursor. The statement: select ToList(cursor(R1 order by {Y}))[0]; uses an indexer ([]) to return the 1st row in a list using table R1 ordered by column Y. ID Y -- - 10 8 The next row in the ordered list can be addressed by using 1 for the indexer: select ToList(cursor(R1 order by {Y}))[1]; ID Y -- -- 25 12 Finally we can get the last row in the list without explicitly stating an index, but using Count to count how many rows there are (lists are addressed from 0 so we subtract 1 from count): select ToList(cursor(R1 order by {Y}))[ToList(cursor(R1 order by {Y})).Count()-1]; ID Y -- -- 35 42 Now we can simply loop over the ordered list to obtain the rank of each Y value: var TempTable:=table of {Y:Integer,Rank:Integer}{}; var LListR1:=ToList(cursor(R1 order by {Y})); var I:Integer:=0; foreach var LListRow in LListR1 do begin I:=I+1;//Reuse previous value of I for rank. insert row{LListRow.Y Y,I Rank} into TempTable; end; select 'TempTable'; select TempTable; select 'R1 joined to TempTable'; select R1 join TempTable with {IgnoreUnsupported = 'true'} order by {Y}; TempTable Y Rank -- ---- 8 1 12 2 16 3 19 4 22 5 25 6 26 7 36 8 39 9 42 10 R1 joined to TempTable ID Y Rank -- -- ---- 10 8 1 25 12 2 20 16 3 40 19 4 50 22 5 30 25 6 15 26 7 45 36 8 55 39 9 35 42 10 We have obtained the ranks by processing the rows of R1 only once as compared to 100 times in a query. The statement: I:= I + 1 which represents the rank of a particular Y value, is a mathematical statement of reusing prior information. And we are able to reuse information as a result of ordering the rows of R1. We can take the concept of the loop one step further by simply transforming the list of rows into a table: select ToTable(ToList(cursor(R1 order by {Y}))) {ID,Y,sequence+1 Cnt} order by {Y}; ID Y Cnt -- -- --- 10 8 1 25 12 2 20 16 3 40 19 4 50 22 5 30 25 6 15 26 7 45 36 8 55 39 9 35 42 10 This represents a most efficient use of prior information, the previous value of Y. Sql is making the same efficient use of prior information with its analytic ranking functions: select ID,Y,row_number()over(order by Y) as Rank from R1 The paper by Ben-Gan can be seen as a plea for Sql Server to more efficiently use prior information (in the form of processing queries in an ordered way) in various contexts. Just as I:=I+1 is a statement of reuse of information, application development is surely based on the reuse of data. And what distinguishes one environment from another is the efficiency of use. Conceptually I:=I+1 is no different than using a key to efficiently access a row of data from a view. To make the connection between reusing prior information efficiently and application development is to see what D4 is all about -:) bye for now, steve This operator returns a random sample (N) of column Y values from table R1 in a table. It takes into account replacement. create operator Rand_Y_Values(N:Integer): table{RowID:Integer,Y1:Integer} begin result:=table of typeof(result){}; var Check:Integer; for var J:Integer:=1 to N do begin var Y1:Integer; Math.Seed(); for var I:Integer:= 9 to 1000 do begin Y1:=Random(1,50); Check:=I; if ( IsNotNil ((R1 adorn {key{Y}})[Y1 by {Y}]) ) and ( IsNil( (result adorn {key{Y1}})[Y1]) ) with {IgnoreUnsupported = 'true'} then break; end; if Check=1000 then begin raise Error('No random value could be found for Row #: '+ ToString(N)); exit; end; result:=result union table{row{J RowID,Y1 Y1}}; end; end; This query uses operator Rand_Y_Values to return a character string that represents a row statement using a value of Y. The rows were copied and pasted into the query. select (R1 times Rand_Y_Values(3) with {IgnoreUnsupported = 'true'}) add{Y<=Y1 TestResult} add{'row{'+ToString(RowID)+' RowID,'+ToString(ID)+' ID,'+ToString(Y)+' Y,'+ ToString(Y1)+' Y1,'+ToString(Y)+'<='+ToString(Y1)+ ' Test,'+ToString(TestResult)+' TestY1},' Arow} {Arow};
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Friday, June 22, 2007
Dataphor - Sql Visualizing a ranking query
Subscribe to:
Post Comments (Atom)
Blog Archive
-
▼
2007
(29)
-
▼
June
(8)
- Sql - Using a dense rank for identifying sections
- Dataphor - Sql Visualizing a ranking query
- Dataphor - Inclusive Sql vs exclusive D4
- Dataphor - Intelligent views
- Dataphor - Sql: what does Update..From mean?
- Dataphor - Passing a table as a parameter
- Dataphor - trimming digits and letters from a string
- Dataphor - String differences operator
-
▼
June
(8)
3 comments:
demi lovato naked photos http://trusted.md/user/demi_lovato_nude nudes of demi lovato
miley cyrus naked porn miley cyrus toon porn
Gosh, there's a lot of helpful information above!
Post a Comment