In his July 2007 MS Sql Server article Itzik Ben-Gan raises the question of identifying consecutive runs of rows where values repeat themselves. T-SQL Black Belt Identifying Sections By:Itzik Ben-Gan The problem is framed as: 'The generic form of the problem involves a table (call it T1) that has two columns of interest: one representing order among rows (call it id) and the other holding some value (call it val). The task at hand is to identify sections of consecutive rows that share the same value. The terms section and consecutive rows are problematic when you're dealing with sets, but as I mentioned, the column ID represents logical order among rows, and once order is defined, these terms become meaningful. For each identified section, you need to return the minimum id, maximum id, val, count of rows in the section, and possibly other aggregates.' The use of the term 'generic' is most appropriate as it conveys the idea that a proposed solution account for 'all' data. Such a generic approach is an 'inclusive' one. Given the sample data: create table TG1 ( id int not null primary key, val varchar(10) not null ); insert into TG1(id, val) values( 1, 'a'); insert into TG1(id, val) values( 2, 'a'); insert into TG1(id, val) values( 3, 'a'); insert into TG1(id, val) values( 5, 'a'); insert into TG1(id, val) values( 7, 'b'); insert into TG1(id, val) values( 9, 'b'); insert into TG1(id, val) values(11, 'a'); insert into TG1(id, val) values(13, 'a'); insert into TG1(id, val) values(17, 'b'); insert into TG1(id, val) values(19, 'b'); insert into TG1(id, val) values(23, 'b'); insert into TG1(id, val) values(29, 'a'); insert into TG1(id, val) values(31, 'b'); insert into TG1(id, val) values(37, 'b'); The ideal solution is to create a new column which can be used with val to group the rows into discrete sections which reflects the fact that a particular val can occur repeated times. id val Grp(Dense_Rank) ---- ------ --------------- 1 a 1 2 a 1 3 a 1 5 a 1 7 b 2 9 b 2 11 a 3 13 a 3 17 b 4 19 b 4 23 b 4 29 a 5 31 b 6 37 b 6 The new column (Grp) is simply a dense rank on val in the order of id. There are two proposed approaches. One uses a subquery to create the Grp column. Various type of subqueries have been proposed as I did some time ago: Jul 5 1999, 12:00 am microsoft.public.sqlserver.programming Difficult SQL Question Author: Trysql Jun 11 2001, 10:33 am microsoft.public.sqlserver.programming SQL challenge: coelesce consecutive and overlapping bookings for a room Author: steve dassin None of the subquery solutions are particularly appealing. They are not intuitive and it is doubtful the average developer would come upon such a beast. And they are certainly not efficient as described here. As for the analytic ranking functions as explained here sql ranking functions cannot rank a column independent of its order. This makes it necessary to simulate a dense rank with a combination of functions. This too is not intuitive and it is doubtful the average developer understands this approach and can apply it to other similar problems. We are really talking expert sql programmers and not application developers. Perhaps a more fundamental issue than the types of solutions is the question posed. A generic solution is the foundation of a report. But of how much importance is the relationship between different vals. Is the intent to see a relationship between different accounts, students,sport teams or to examine data/relationships within the same team but different sections? If the latter then is it really necessary to produce sections on everything? If the question is framed as identify all the sections for a 'particular' val, as seems reasonable, the question of the types of solutions and there nature dramatically changes. Now the whole solution can be based on the very basic idea of a dense rank - if the thing is the same use the current rank, if another thing increment the rank. Any developer using any client should have no problem seeing this as nothing more than a foreach loop. Assuming val 'a' is of interest in D4 this can be done as simply: var TempTable:=table of{id:Integer,val:String,Grp:Integer}{}; var I:Integer:=1; foreach var LItem in TG1 do begin if LItem.val<>'a' then I:=I+1; if LItem.val='a' then insert row{LItem.id id,LItem.val val,I Grp} into TempTable; end; select TempTable; id val Grp -- --- --- 1 a 1 2 a 1 3 a 1 5 a 1 11 a 3 13 a 3 29 a 6 We have completely eliminated any need of subqueries or ranking functions. In fact we have eliminated any need of any type of query. We can, of course, create a procedure to return sections for any val. And number the sections consecutively in case that is helpful. A D4 procedure would be: create operator Sections_for_a_val(aTable:typeof(TG1),aVal:String): table{id:Integer,val:String,Grp:Integer} begin result:=table of typeof(result){}; var I:Integer:=1; foreach row in aTable do begin if val<>aVal then I:=I+1; if val=aVal then insert row{id id,val val,I Grp} into result; end; result:= result join ( ToTable(ToList(cursor(result over{Grp} order by {Grp}))) {Grp,sequence+1 GrpSeq} ) {id,val,GrpSeq Grp}; end; select Sections_for_a_val(TG1,'a'); id val Grp -- --- --- 1 a 1 2 a 1 3 a 1 5 a 1 11 a 2 13 a 2 29 a 3 select Sections_for_a_val(TG1,'b'); id val Grp -- --- --- 7 b 1 9 b 1 17 b 2 19 b 2 23 b 2 31 b 3 37 b 3 Summarizing is the same as in the sql solutions. select Sections_for_a_val(TG1,'a') group by {Grp} add{Min(id) start_section,Max(id) end_section,Count() num_rows}; Grp start_section end_section num_rows --- ------------- ----------- -------- 1 1 5 4 2 11 13 2 3 29 29 1 Finally, least I have neglected sql, here is yet another subquery approach (in D4) to dense ranks that takes the sql server 2005 row_number() function as its starting point. var A:= SQLQuery('select id,val,row_number()over(order by id) RowID from TG1'); select A add { Count( (A rename {RowID RowID1} where RowID1<=RowID) add{ (A adorn{key{RowID}})[RowID1-1].val = (A adorn{key{RowID}})[RowID1].val Testval} where not Testval ) + 1 DenseRank }; id val RowID DenseRank -- --- ----- --------- 1 a 1 1 2 a 2 1 3 a 3 1 5 a 4 1 7 b 5 2 9 b 6 2 11 a 7 3 13 a 8 3 17 b 9 4 19 b 10 4 23 b 11 4 29 a 12 5 31 b 13 6 37 b 14 6 There is certainly nothing 'wrong' with a report. But it is perhaps wrong-headed to try to make one when it is not really the object of the exercise. steve
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Showing posts with label sql dense rank for identifying consecutive runs. Show all posts
Showing posts with label sql dense rank for identifying consecutive runs. Show all posts
Sunday, June 24, 2007
Sql - Using a dense rank for identifying sections
Subscribe to:
Posts (Atom)