Sorting a delimited string by its numerical string parts. This is a possible solution to the problem presented in the article: 'T-SQL Sorting Challenge' By: Itzik Ben-Gan http://www.sqlmag.com/Article/ArticleID/100156/100156.html "You are given a table called t1 with a character string column called val. Each string in the val column holds a dot separated list of integers. Your task is to write a T-SQL solution that sorts the strings based on the integer values constituting the string segments. Note that the number of integers in each string may vary, and is only limited by the column type VARCHAR(500). Extra points if your solution will also support negative integers." So the problem is how to construct a sort expression that represents the positive and negative integers of the ids. This solution uses Dataphor with the data being stored in Sql Server 2005. Sample data The data is populated thru dataphor and persisted in the sql server northwind db. The table t1 in the article is table IZ here. The article uses an sql identity for the primary key id, here it is explicitly declared. create table IZ { id:Integer, val:String tags {Storage.Length='75'}, key{id} }; //Positive value strings. insert row{1 id,'100' val} into IZ; insert row{2 id,'7.4.250' val} into IZ; insert row{3 id,'22.40.5.60.4.100.300.478.19710212' val} into IZ; insert row{4 id,'22.40.5.60.4.99.300.478.19710212' val} into IZ; insert row{5 id,'22.40.5.60.4.99.300.478.9999999' val} into IZ; insert row{6 id,'10.30.40.50.20.30.40' val} into IZ; insert row{7 id,'7.4.250' val} into IZ; //Add negative values. insert row{8 id,'-1' val} into IZ; insert row{9 id,'-2' val} into IZ; insert row{10 id,'-11' val} into IZ; insert row{11 id,'-22' val} into IZ; insert row{12 id,'-123' val} into IZ; insert row{13 id,'-321' val} into IZ; insert row{14 id,'22.40.5.60.4.-100.300.478.19710212' val} into IZ; insert row{15 id,'22.40.5.60.4.-99.300.478.19710212' val} into IZ; Go directly to dataphor solution Go directly to a solution using dataphor and sql server t-sql Go directly to a solution using the Rac utility on sql server 2000 Go directly to a solution using the Rac utility on sql server 2005 ( Rac is a system of stored procedures and functions for sql server designed to simplify solving various data manipulation problems including dynamic crosstabs, complex running sums and ranking, string manipulations etc. )
Stepping thru the logic of the solution Use the dataphor Split operator to split the val string for each id into individual strings starting at Index 1 and going to the number of parts delimited by the period ('.'). Note that the string is converted to an integer. So we're dealing with numbers and not strings. select (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times //times is equivalent to an sql CROSS JOIN. (Numbers where N<10) //A table with a single column N, an integer from 0 to 800. where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} order by {id,Index} ; id Index StrNum -- ----- -------- 1 1 100 <- id 1 has only a single value. 2 1 7 <- id 2 has 3 values. 2 2 4 2 3 250 3 1 22 <- id 3 has 9 values. 3 2 40 3 3 5 3 4 60 3 5 4 3 6 100 3 7 300 3 8 478 3 9 19710212 4 1 22 . . . Now lets look at the same data but within each Index and within each Index ordered by the string as a number (StrNum). Remember all ordering is ascending. select (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} order by {Index,StrNum}; id Index StrNum -- ----- -------- 13 1 -321 <- id 13 has lowest overall number for 1st string part (Index 1). 12 1 -123 11 1 -22 10 1 -11 9 1 -2 8 1 -1 2 1 7 7 1 7 6 1 10 3 1 22 <- ids 3,4,5,14,15 have the same value (22) for Index 1. 4 1 22 5 1 22 14 1 22 15 1 22 1 1 100 <- id 1 has highest overall number for 1st string part (Index 1). . . . 3 8 478 <- Index 8 has the same string value for the 5 ids that have an 8th string part. 4 8 478 5 8 478 14 8 478 15 8 478 5 9 9999999 3 9 19710212 4 9 19710212 14 9 19710212 15 9 19710212 The strings as integers are nicely sorted within each Index over the ids. How can we represent the same ordering within each Index independent of the positive and negative numbers (and strings that indicate positive and negative numbers)? What about with a ranking. So lets rank the numbers within each Index. The combination of the dataphor ToTable, ToList and cursor operators will generate a rank (a column named sequence) that follows the specified cursor ordering. We order the cursor by Index,StrNum to get an ascending rank within each Index based on the StrNum values. The rank is column RowNum. select ToTable( ToList( cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum} order by {Index,StrNum}; Index id StrNum RowNum ----- -- -------- ------ 1 13 -321 1 <- id 13 has lowest value (-321) so id 1 gets lowest rank (1) within Index 1. 1 12 -123 2 1 11 -22 3 1 10 -11 4 1 9 -2 5 1 8 -1 6 1 2 7 7 <- ids 2 and 7 get a different rank for the duplicate values of 7 :( 1 7 7 8 1 6 10 9 1 3 22 10 <- all 5 ids get a different rank for the duplicate values of 22 :( 1 4 22 11 1 5 22 12 1 14 22 13 1 15 22 14 1 1 100 15 <- id 1 has highest value (100) so id 1 gets highest rank (15) within Index 1. . . . . 8 3 478 56 8 4 478 57 8 5 478 58 8 14 478 59 8 15 478 60 9 5 9999999 61 9 3 19710212 62 <- all 4 ids get a different rank for the duplicate values of 19710212 :( 9 4 19710212 63 9 14 19710212 64 9 15 19710212 65 ( Note that this rank is equilvant to the sql ranking function ROW_NUMER(). The rank could be obtained in sql using: ROW_NUMBER()OVER(ORDER BY [Index],StrNum) AS RowNum But the rank we want is really based on the sql RANK() function which accounts for duplicate/ties (of StrNum) by giving them the same rank. Therefore it's necessary in dataphor to use a join to append the correct ranks to the table. In sql the join isn't necessary, RANK() can be used directly on the table:ie. RANK()OVER(ORDER BY [Index],StrNum) AS Rank (See t-sql solution) For more on dataphor and sql ranking see: 'The Sql ranking OVERture' http://beyondsql.blogspot.com/2008/04/sql-ranking-overture.html ) If you haven't guessed it by now ☺ the idea is to create a string for each id based on the ranks which will be used to order the ids. But we have a problem because for duplicate values of a number we're getting different ranks. We want the 'same' rank for duplicate values since the same integer cannot be used to distinguish among the ids. We can remedy the different ranks for duplicate values by simply choosing the minimum rank (RowNum) for the value and assigning this rank to all ids. Also note that the ranks continue to ascend over the Indexs. This is ok because any numbers representing the ranks are ok if they correctly maintain the ordering of integer strings values within the Index. select ToTable( ToList( cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum} group by {Index,StrNum} add{Min(RowNum) Rank} order by {Index,StrNum}; Index StrNum Rank ----- -------- ---- 1 -321 1 1 -123 2 1 -22 3 1 -11 4 1 -2 5 1 -1 6 1 7 7 <- a rank of 7 can be assigned to the two ids (2,7) with a value of 7 for Index 1. 1 10 9 1 22 10 <- a rank of 10 can be assigned to all ids with a value of 22 for Index 1. 1 100 15 . . . 8 478 56 9 9999999 61 9 19710212 62 <- a rank of 62 can be assigned to all ids with a value of 19710212 for Index 9. What we now have is a table of unique Index/StrNum combinations with a unique rank for each combination. It's only necessary to join this table to the table of split strings (IZ) for all ids by Index and StrNum to properly assign the correct(ed) ranks. (As mentioned above this is the same rank that would be obtained using the sql RANK() function and ordering by Index,StrNum. And note that using the sql RANK() would eliminate the need to do a join in dataphor. Imagine dataphor with native sql like ranking operations ☺ ) Because the objective is to create a string to sort the ids we can't just use the numeric rank, we have to modify it for string ordering. Given two ranks of 7 and 11 if they are strings, '7' and '11', an ascending sort would have '11' come '7': '11' '7' This is the very problem the article is addressing! So we have to modify the the strings to have '7' come '11' ☺. We can modify the '7' by left padding it with '0'. So when we sort ascending we'll have the correct representation of of the true numeric order of the values: '07' '11' How much to pad a rank, how many '0's to insert, is the string length of the maximum rank generated. Because the ranks in dataphor keep ascending regardless of the cursor ordering, the maximum rank (ignoring duplicates) is the count of rows in the table. You could even make an educated guess based on the amount of data and use that ☺. Left padding the string rank (RankStr) based on a maximum length of 2 we now have all the data to finally construct a sorting column for the ids. var UnpackedStrings:= //This variable holds all the split data and will be used in the select query. ToTable(ToList(cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum}; var LengthofMaxRank:=2; select (UnpackedStrings {id,Index,StrNum}) join //Join the unique ranks to all split data. This is a natural join (on Index/StrNum). //Create a left padded string (RankStr) from the numeric rank. (UnpackedStrings group by {Index,StrNum} add{Min(RowNum) Rank}) {id,Index,StrNum,Rank,PadLeft(ToString(Rank),LengthofMaxRank,'0') RankStr} order by {Index,Rank}; id Index StrNum Rank RankStr -- ----- -------- ---- ------- 13 1 -321 1 01 <- ranks 1-9 are left padded in the rank string (RankStr). 12 1 -123 2 02 11 1 -22 3 03 10 1 -11 4 04 9 1 -2 5 05 8 1 -1 6 06 2 1 7 7 07 7 1 7 7 07 6 1 10 9 09 3 1 22 10 10 4 1 22 10 10 5 1 22 10 10 14 1 22 10 10 15 1 22 10 10 1 1 100 15 15 . . . . . 3 8 478 56 56 4 8 478 56 56 5 8 478 56 56 14 8 478 56 56 15 8 478 56 56 5 9 9999999 61 61 3 9 19710212 62 62 4 9 19710212 62 62 14 9 19710212 62 62 15 9 19710212 62 62 The id sort column can now be formed by concatenating, using the Concat operator, the RankStr within each id in the order of the rank (either Rank, Index or RankStr). This is easy to see by ordering the above data (table) by id,Rank. The ascending order of Index, Rank and RankStr all reflect where an id lies in value (RankStr) relative to the other ids. The sort expression will be column SortStr. id Index StrNum Rank RankStr -- ----- -------- ---- ------- 1 1 100 15 15 2 1 7 7 07 2 2 4 16 16 2 3 250 30 30 3 1 22 10 10 3 2 40 19 19 3 3 5 24 24 3 4 60 33 33 3 5 4 38 38 3 6 100 49 49 3 7 300 51 51 3 8 478 56 56 3 9 19710212 62 62 The complete dataphor solution var UnpackedStrings:= //Variable that holds all split data and ranks the split strings //numerically within each Index ordered by the numeric string value. ToTable(ToList(cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum}; var LengthofMaxRank:= Length(Count(UnpackedStrings).ToString()); //A string length used to left pad the rank strings for //a correct string sort (will be 2 here). select IZ join //Natural join of input table IZ to sort column expression by id. ( //Join unique ranks to all split strings. ( (UnpackedStrings {id,Index,StrNum}) join // Adjust ranks to be unique for each Index/StrNum, duplicate values should get same rank. (UnpackedStrings group by {Index,StrNum} add{Min(RowNum) Rank}) //Left pad rank string with '0' (RankStr) for sorting string correctly. Add empty //string ('') to be used as a delimiter in concatenating the ranks. {id,Index,StrNum,Rank,PadLeft(ToString(Rank),LengthofMaxRank,'0') RankStr,'' Del} ) adorn {key{id,Rank}} //Sql uses physical hints, in dataphor you use logical ones. //Form the sorting expression SortStr to sort ids by concatenating the string ranks //in the order of any of the ranking columns or Index. group by {id} add{Concat(RankStr,Del order by {id,Rank}) SortStr} ) order by {SortStr}; //The object of the exercise, sort the ids by SortStr to get the correct //numerical order of val. id val SortStr -- ---------------------------------- ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 2 7.4.250 071630 7 7.4.250 071630 6 10.30.40.50.20.30.40 09182932434650 14 22.40.5.60.4.-100.300.478.19710212 101924333844515662 15 22.40.5.60.4.-99.300.478.19710212 101924333845515662 5 22.40.5.60.4.99.300.478.9999999 101924333847515661 4 22.40.5.60.4.99.300.478.19710212 101924333847515662 3 22.40.5.60.4.100.300.478.19710212 101924333849515662 1 100 15 Solution using the sql server RANK() function with a pass thru query The solution can be made more compact using the sql RANK() function since dataphor doesn't have a direct equivalent ranking operation. Since sql server can only access persisted tables (and views) and not dataphor expressions (we can't just stick in any dataphor table expression in a pass thru query) we'll create a persisted table to hold all the split data. create table IZSqlRanks //The table will be created in sql server. { id:Integer, Index:Integer, StrNum:Integer, key{id,Index} }; Do the splitting in dataphor (because it knows the difference between a string and a list ☺ ) and then 'assign' the resulting table to the persisted table IZSqlRanks. Relational databases support this kind of assignment for all variables including tables. And all tables in dataphor are variables. (To my sql friends this makes all the difference in the world ☺ ) IZSqlRanks:= (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum}; //Use a t-sql passthru query to take advantage of the sql RANK() function. The //resulting table will be treated like any other table (expression) in dataphor. //Left pad the string rank (RankStr) for sorting purposes. We're using a total //string length of 2 here so single digit ranks will be padded with a leading '0'. select IZ join ( SQLQuery('SELECT id,[Index],StrNum,RANK()OVER(ORDER BY [Index],StrNum) AS Rank FROM IZSqlRanks') //Use the sql result as if it was a native dataphor expression. {id,Index,StrNum,Rank,PadLeft(ToString(Rank),2,'0') RankStr,'' Del} adorn {key{id,Rank}} //Metadata (a key) pertaining to the table expression. //This key will be efficiently used by the Concat operation. //Concatenate the rank strings for each id to be used as the sort order for ids. group by {id} add{Concat(RankStr,Del order by {id,Rank}) SortStr} ) order by {SortStr}; id val SortStr -- ---------------------------------- ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 2 7.4.250 071630 7 7.4.250 071630 6 10.30.40.50.20.30.40 09182932434650 14 22.40.5.60.4.-100.300.478.19710212 101924333844515662 15 22.40.5.60.4.-99.300.478.19710212 101924333845515662 5 22.40.5.60.4.99.300.478.9999999 101924333847515661 4 22.40.5.60.4.99.300.478.19710212 101924333847515662 3 22.40.5.60.4.100.300.478.19710212 101924333849515662 1 100 15 Solving the problem on sql server 2000 with the Rac utility The Rac solution follows the same logic as the dataphor and sql methods. The 1st Rac execute creates the Index column, from 1 to N, for every string part (delimited integer) an id has. The 2nd Rac execute creates a rank over the ids based on the integers (string parts) within each Index. The rank Rac generates is equivalent to the sql DENSE_RANK() function. This rank, like RANK(), gives duplicate values the same rank but, unlike RANK(), does not take the duplicate ranks into account when generating the next rank. RANK() skips ahead based on ties while DENSE_RANK() consecutively numbers the ranks. Both types of ranks give the same correct sort result for the ids. The 3rd Rac execute concatenates the left padded rank strings for each id and returns the ids sorted by them, correctly :) Note that Rac is called recursively twice. Exec Rac @split='[position]', -- Rac splits each val string from left to right based on a period ('.'). @rows='id & [position]',-- Rac keeps the position each new string part starts in. @pvtcol='val', -- the target column of the split operation. @from='IZ', -- use a counter to generate the Index column (from 1-N) that indicates the individual string part in a val. @separator='.',@rowcounters='id{[Index]}',@counterdatatype='int', @defaults1='y',@rowbreak='n',@racheck='y',@shell='n', @select=' SELECT 1*id AS id,[Index],CAST([1] AS INT) AS StrNum INTO #J1 FROM rac Exec Rac @transform=~_dummy_~, @rows=~[Index] & StrNum & id~, @pvtcol=~Report Mode~, @from=~#J1~, @defaults1=~y~,@rowbreak=~n~,@racheck=~y~,@shell=~n~, /* use a Rac counter to rank the integer string parts within each Index (column Rank).*/ @rowindicators=~StrNum{Rank}_overtable_~,@counterdatatype=~int~, @select= ~if object_id(~~##J2~~) is not null drop table dbo.t1 SELECT 1*id AS id,1*StrNum AS StrNum,1*[Index] AS [Index],1*Rank AS Rank, /* left pad single digit string ranks for proper character sorting.*/ REPLICATE(~~0~~,2-DATALENGTH(CAST(Rank AS VARCHAR(2)))) + CAST(Rank AS VARCHAR(2)) AS RankStr INTO ##J2 FROM rac~ /* concatenate the string ranks (RankStr) within each id into column SortStr.*/ Exec Rac @transform=~Max(RankStr) as RankStr~, @rows=~id~, @pvtcol=~Rank~, @from=~##J2~, @defaults1=~y~,@racheck=~y~,@shell=~n~,@cutpvt=~y~, @concatenate=~RankStr~,@separator=~~,@stringname=~SortStr~, /* return the ids sorted by the concatenated string ranks.*/ @select=~SELECT IZ.id,val,SortStr FROM IZ JOIN rac ON IZ.id=rac.id ORDER BY SortStr DROP TABLE ##J2~' id val SortStr ----------- ------------------------------------ ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 7 7.4.250 071116 2 7.4.250 071116 6 10.30.40.50.20.30.40 08121517202326 14 22.40.5.60.4.-100.300.478.19710212 091314181921272830 15 22.40.5.60.4.-99.300.478.19710212 091314181922272830 5 22.40.5.60.4.99.300.478.9999999 091314181924272829 4 22.40.5.60.4.99.300.478.19710212 091314181924272830 3 22.40.5.60.4.100.300.478.19710212 091314181925272830 1 100 10 More on Rac @ www.rac4sql.net Solving the problem on sql server 2005 with the Rac utility The Rac solution in S2k5 is similar to the one on S2K (go there for more info). But it's simpler since the split string parts (integers) can be ranked directly using a dense rank function not available in S2k. This eliminates a recursive call to Rac so here it's executed twice instead of three times (in S2k). Exec Rac @split='[position]', @rows='id & [position]', @pvtcol='val', @from='IZ', @separator='.',@rowcounters='id{[Index]}',@counterdatatype='int', @defaults1='y',@rowbreak='n',@racheck='y',@shell='n', -- the DENSE_RANK() function, available in S2k5, is used to rank the integer string parts -- and is left padded. @select=' SELECT id,[Index],StrNum,Rank, REPLICATE(~0~,2-DATALENGTH(CAST(Rank AS VARCHAR(2)))) + CAST(Rank AS VARCHAR(2)) AS RankStr INTO #J1 FROM (SELECT 1*id AS id,[Index],CAST([1] AS INT) AS StrNum, DENSE_RANK()OVER(ORDER BY [Index],CAST([1] AS INT)) AS Rank FROM rac ) AS A Exec Rac @transform=~Max(RankStr) as RankStr~, @rows=~id~, @pvtcol=~Rank~, @from=~#J1~, @defaults1=~y~,@racheck=~y~,@shell=~n~,@cutpvt=~y~, @concatenate=~RankStr~,@separator=~~,@stringname=~SortStr~, @select=~SELECT IZ.id,val,SortStr FROM IZ JOIN rac ON IZ.id=rac.id ORDER BY SortStr~' id val SortStr ----------- ------------------------------------ ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 7 7.4.250 071116 2 7.4.250 071116 6 10.30.40.50.20.30.40 08121517202326 14 22.40.5.60.4.-100.300.478.19710212 091314181921272830 15 22.40.5.60.4.-99.300.478.19710212 091314181922272830 5 22.40.5.60.4.99.300.478.9999999 091314181924272829 4 22.40.5.60.4.99.300.478.19710212 091314181924272830 3 22.40.5.60.4.100.300.478.19710212 091314181925272830 1 100 10 More on Rac @ www.rac4sql.net