Is not the concept of a 'coordinate' a clear and powerful idea? The entire world is mapped by the coordinates of latitude and longitude. You can pinpoint any place in the world given its coordinates. A primary key on a table is a coordinate. The key identifies preciously the row in the table. If you can create the coordinates of any information you have created the ability to preciously identify anything about it. Especially when the information is data. A frequent problem is finding a particular occurrence of a 'some thing' in a string. The 'some thing' is frequently a number. The problem becomes finding the Nth occurrence of a number, any number. For example, given the string: 'ASF--124 RW!==RXT34146$==PFS773044Y -JPFK*5435DAD ' extract the 3rd occurrance of a number, ie. '773044'. Often the solution to this is to build a forrest of string functions. But it is too easy to get lost in such a forrest:) But, if we can map the string with coordinates the problem and solution become clear, precise and simple. So how do we create coordinates and what form do they take? Well we can transpose the string to rows and create a set of coordinates for each row, ie. each character. The rows/characters correspond to the string as we read it from left to right. Once the string is transposed we'll create 2 coordinates. The first one is simply an indicator for whether or not the character is a digit. This makes sense since conceptually we only care if a character is a digit (part of a number) or is a non-digit, not part of a number. We're not concerned about individual characteristics of any character, right?:) Now the 2nd coordinate, well that's going to be a bit of magic. And this bit of magic is going to take the form of a 'rank'. I'm now going to put RAC to use to develop a very particular but easy rank. I'll be using: MS Sql Server 2005 sp1 RAC v2.34 beta For details on RAC go here. First some test data: create table RT (rowid int primary key,messystr varchar(30)) insert RT values(1,'12-%= 312 ') insert RT values(2,'YTW$152&OP8393&K734680UF104') insert RT values(3,'624/!/53RF274UT56') insert RT values(4,'342') insert RT values(5,'ORDER') insert RT values(6,'\\8AApEFS30902$1') insert RT values(7,'4#7/&91A56G7VW37@&&&&') select * from RT rowid messystr ----------- ------------------------------ 1 12-%= 312 2 YTW$152&OP8393&K734680UF104 3 624/!/53RF274UT56 4 342 5 ORDER 6 \\8AApEFS30902$1 7 4#7/&91A56G7VW37@&&&& We're going to start by splitting each string character by character per rowid. We're also going to create the first coordinate, a new column, that's 1 if the character is a digit and 2 for everything else. This is column 'type'. Column pos is the position (from left to right) of the character ('str1') in 'messystr'. Column 'rd' is generated by Rac and is an ascending integer over the whole table that follows the sort order by 'rowid' and 'pos'. The result is saved in table ##t1. Exec Rac @split='[position]', @rows='rowid & [position]', @pvtcol='messystr',@rank='str', @splitby='1', -- Splits a string by N consecutive positions. @from='RT',@defaults1='y',@rowbreak='n',@racheck='y', @select='select rd,1*rowid as rowid,1*position as pos,str1, case when ascii(str1) between 48 and 57 then 1 else 2 end as type into ##t1 from rac order by rd' Here is the result for 'rowids' 1 and 6. select * from ##t1 where rowid in (1,6) rd rowid pos str1 type ----------- ----------- ----------- ---- ----------- 1 1 1 1 1 2 1 2 2 1 3 1 3 - 2 4 1 4 % 2 5 1 5 = 2 6 1 6 2 7 1 7 3 1 8 1 8 1 1 9 1 9 2 1 10 1 10 2 63 6 1 \ 2 64 6 2 \ 2 65 6 3 8 1 66 6 4 A 2 67 6 5 A 2 68 6 6 p 2 69 6 7 E 2 70 6 8 F 2 71 6 9 S 2 72 6 10 3 1 73 6 11 0 1 74 6 12 9 1 75 6 13 0 1 76 6 14 2 1 77 6 15 $ 2 78 6 16 1 1 Now for our 2nd coordinate, a rank. The 'type' varies within 'rowid' ordered by 'rd'. We want a rank that increments only when 'type' changes within 'rowid' and consequently stays the same within 'type'. Within a 'rowid' the rank can range from 1 to N. When the 'rowid' changes the rank resets to 1. This kind of rank is obtained using the Rac @rowindicators parameter. Exec Rac @transform='_dummy_', @rows='rowid & type & pos & str1', @rowsort='rd', @pvtcol='Sql*Plus',@defaultexceptions='dumy', @from='##t1',@defaults1='y',@rowbreak='n',@racheck='y', @rowindicators='type{typerank}',@counterdatatype='int' Here is the result for rowid's 1 and 6. rowid type pos str1 typerank ------ ---- ---- ---- ----------- 1 1 1 1 1 1st occurrence @1 1 1 2 2 1 1st occurrence @1 1 2 3 - 2 1 2 4 % 2 1 2 5 = 2 1 2 6 2 1 1 7 3 3 1 1 8 1 3 1 1 9 2 3 1 2 10 4 6 2 1 \ 1 6 2 2 \ 1 6 1 3 8 2 1st occurrence @2 6 2 4 A 3 6 2 5 A 3 6 2 6 p 3 6 2 7 E 3 6 2 8 F 3 6 2 9 S 3 6 1 10 3 4 6 1 11 0 4 6 1 12 9 4 6 1 13 0 4 6 1 14 2 4 6 2 15 $ 5 6 1 16 1 6 With 'type' and 'typerank' we now have the coordinates of any occurrence of a number. If you examine the table closely you should be able to come up with the formulation of it:) Ok here it is: The Nth occurrence of a number has coordinates: 'type'=1 and 'typerank' in ((2*N)-1 , 2*N) 1st occurrence (2*1)-1 , 2*1 : in (1,2) 2nd occurrence (2*2)-1 , 2*2 : in (3,4) 3rd occurrence (2*3)-1 , 2*3 : in (5,6) 4th occurrrence (2*4)-1 , 2*4 : in (7,8) Note that 'typerank' is an 'or' not an 'and'. You are guaranteed to have a number in one but not both:) For example, for the 1st occurrence the number will have a 'typerank' of 1 if it is not proceeded by non-digits (see 'rowid' 1 above). If non-digits come first the 1st occurrence is at 'typerank' 2 (see 'rowid' 6 above). The same logic applies to any Nth occurrence. So now we can put a variable filter in RAC that accepts the computed coordinates for 'typerank' for any particular occurrence we want/pass to it (a form of macro substitution for stored procedure parameter arguments). The filter is contained in the @wherecounters parameter. For example, we'll get the rows for the 2nd occurrence of a number. declare @N int,@N1 int,@N2 int -- Only the value of @N is needed. set @N=2 set @N1=(@N*2)-1 set @N2=@N*2 -- Exec Rac @transform='_dummy_',@user1=@N1,@user2=@N2, @rows='rowid & type & pos & str1', @rowsort='rd', @pvtcol='Sql*Plus',@defaultexceptions='dumy', @from='##t1',@defaults1='y',@rowbreak='n',@racheck='y', @rowindicators='type{typerank}',@counterdatatype='int', -- -- Use coordinates to filter rows. @N1 and @N2 -- will be substituted for @user1 and @user2. -- @wherecounters='type=1 and typerank in(@user1,@user2)', @select='select rd,1*rowid as rowid,1*pos as pos,str1 into ##t2 from rac order by rd' select * from ##t2 rd rowid pos str1 ----------- ----------- ----------- ---- 7 1 7 3 8 1 8 1 9 1 9 2 21 2 11 8 22 2 12 3 23 2 13 9 24 2 14 3 44 3 7 5 45 3 8 3 72 6 10 3 73 6 11 0 74 6 12 9 75 6 13 0 76 6 14 2 81 7 3 7 Now we only have to concatenate the rows within 'rowid' to return the number. This is very simple with Rac. Exec Rac @transform='Max(str1) as str', @rows='rowid', @pvtcol='pos', @from='##t2', @defaults1='y',@racheck='y', @concatenate='str',@stringname='number', @separator='',@cutpvt='y' rowid number ------ ------ 1 312 2 8393 3 53 6 30902 7 7 And there you have it. Of course the whole process can be streamlined. So are you going to get coordinated or are you going to continue to go around in circles? :) To paraphrase Gordon Geiko from 'Wall Street': 'Abstraction is good, it clarifies and purifies...' It allows you to solve a problem, an end instead of solving a means and then an end. There is something to be said about qualitatively different solutions. Precisely what I am unsure, but I think it something very good. We need more abstraction in data management. That is why I am an advocate of Dataphor and the D4 language. That Rac doesn't have a lot of company is not a good thing, it's a bad thing. That a company like MS has a vested interest in not fostering abstraction is a bad thing. The industry continues to flounder in its own sea of code and contradictions. That is not a good thing. Find your way to show another way. (Added 12/20/2008 Note to see a much easier solution to this problem using Dataphor see: 'Extracting numbers from a string with Dataphor')
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Saturday, September 02, 2006
RAC - Are you Coordinated?
Subscribe to:
Post Comments (Atom)
2 comments:
This is my first post I'd like to thank you for such a great quality forum!
Just thought this would be a nice way to introduce myself!
Sincerely,
Edwyn Sammy
if you're ever bored check out my site!
[url=http://www.partyopedia.com/articles/sesame-street-party-supplies.html]sesame street Party Supplies[/url].
top [url=http://www.001casino.com/]casino bonus[/url] hinder the latest [url=http://www.casinolasvegass.com/]online casinos[/url] free no set aside bonus at the best [url=http://www.baywatchcasino.com/]no lay gratuity
[/url].
Post a Comment