Beyond Sql: RAC - Are you Coordinated?

Is not the concept of a 'coordinate' a clear and powerful
idea? The entire world is mapped by the coordinates of
latitude and longitude. You can pinpoint any place in the
world given its coordinates. A primary key on a table is
a coordinate. The key identifies preciously the row in
the table. If you can create the coordinates of any 
information you have created the ability to preciously
identify anything about it. Especially when the information
is data.

A frequent problem is finding a particular occurrence of a
'some thing' in a string. The 'some thing' is frequently
a number. The problem becomes finding the Nth occurrence
of a number, any number. For example, given the string:

'ASF--124 RW!==RXT34146$==PFS773044Y -JPFK*5435DAD '

extract the 3rd occurrance of a number, ie. '773044'.

Often the solution to this is to build a forrest of string
functions. But it is too easy to get lost in such a
forrest:) But, if we can map the string with coordinates
the problem and solution become clear, precise and
simple. So how do we create coordinates and what
form do they take? Well we can transpose the string to
rows and create a set of coordinates for each row, 
ie. each character. The rows/characters correspond to
the string as we read it from left to right. Once the
string is transposed we'll create 2 coordinates. The
first one is simply an indicator for whether or not
the character is a digit. This makes sense since
conceptually we only care if a character is a digit
(part of a number) or is a non-digit, not part of
a number. We're not concerned about individual
characteristics of any character, right?:) Now the
2nd coordinate, well that's going to be a bit of magic.
And this bit of magic is going to take the form of
a 'rank'. I'm now going to put RAC to use to develop
a very particular but easy rank.

I'll be using:
MS Sql Server 2005 sp1
RAC v2.34 beta
For details on RAC go here.

First some test data:

create table RT (rowid int primary key,messystr varchar(30))
insert RT values(1,'12-%= 312 ')
insert RT values(2,'YTW$152&OP8393&K734680UF104')
insert RT values(3,'624/!/53RF274UT56')
insert RT values(4,'342')
insert RT values(5,'ORDER')
insert RT values(6,'\\8AApEFS30902$1')
insert RT values(7,'4#7/&91A56G7VW37@&&&&')

select * from RT

rowid       messystr                       
----------- ------------------------------ 
1           12-%= 312 
2           YTW$152&OP8393&K734680UF104
3           624/!/53RF274UT56
4           342
5           ORDER
6           \\8AApEFS30902$1
7           4#7/&91A56G7VW37@&&&&

We're going to start by splitting each string character
by character per rowid. We're also going to create the
first coordinate, a new column, that's 1 if the character
is a digit and 2 for everything else. This is column 'type'.
Column pos is the position (from left to right) of the
character ('str1') in 'messystr'. Column 'rd' is generated
by Rac and is an ascending integer over the whole table
that follows the sort order by 'rowid' and 'pos'. The
result is saved in table ##t1.

Exec Rac
@split='[position]',
@rows='rowid & [position]', 
@pvtcol='messystr',@rank='str',
@splitby='1', -- Splits a string by N consecutive positions.
@from='RT',@defaults1='y',@rowbreak='n',@racheck='y',
@select='select rd,1*rowid as rowid,1*position as pos,str1,
         case when ascii(str1) between 48 and 57 then 1 else 2 end as type
         into ##t1
           from rac
             order by rd'

Here is the result for 'rowids' 1 and 6.

select * from ##t1 where rowid in (1,6)

rd          rowid       pos         str1 type        
----------- ----------- ----------- ---- ----------- 
1           1           1           1    1
2           1           2           2    1
3           1           3           -    2
4           1           4           %    2
5           1           5           =    2
6           1           6                2
7           1           7           3    1
8           1           8           1    1
9           1           9           2    1
10          1           10               2
63          6           1           \    2
64          6           2           \    2
65          6           3           8    1
66          6           4           A    2
67          6           5           A    2
68          6           6           p    2
69          6           7           E    2
70          6           8           F    2
71          6           9           S    2
72          6           10          3    1
73          6           11          0    1
74          6           12          9    1
75          6           13          0    1
76          6           14          2    1
77          6           15          $    2
78          6           16          1    1

Now for our 2nd coordinate, a rank. The 'type' varies
within 'rowid' ordered by 'rd'. We want a rank that
increments only when 'type' changes within 'rowid'
and consequently stays the same within 'type'. Within
a 'rowid' the rank can range from 1 to N. When the
'rowid' changes the rank resets to 1. This kind of
rank is obtained using the Rac @rowindicators parameter.

Exec Rac
@transform='_dummy_',
@rows='rowid & type & pos & str1',
@rowsort='rd',
@pvtcol='Sql*Plus',@defaultexceptions='dumy',
@from='##t1',@defaults1='y',@rowbreak='n',@racheck='y',
@rowindicators='type{typerank}',@counterdatatype='int'

Here is the result for rowid's 1 and 6.

rowid  type pos  str1 typerank    
------ ---- ---- ---- ----------- 
1      1    1    1    1   1st occurrence @1
1      1    2    2    1   1st occurrence @1
1      2    3    -    2
1      2    4    %    2
1      2    5    =    2
1      2    6         2
1      1    7    3    3
1      1    8    1    3
1      1    9    2    3
1      2    10        4
6      2    1    \    1
6      2    2    \    1
6      1    3    8    2    1st occurrence @2
6      2    4    A    3
6      2    5    A    3
6      2    6    p    3
6      2    7    E    3
6      2    8    F    3
6      2    9    S    3
6      1    10   3    4
6      1    11   0    4
6      1    12   9    4
6      1    13   0    4
6      1    14   2    4
6      2    15   $    5
6      1    16   1    6

With 'type' and 'typerank' we now have the coordinates
of any occurrence of a number. If you examine the
table closely you should be able to come up with the
formulation of it:) Ok here it is:

The Nth occurrence of a number has coordinates:

'type'=1 and

'typerank' in ((2*N)-1 , 2*N)
 
1st occurrence
(2*1)-1 , 2*1 : in (1,2)
 
2nd occurrence
(2*2)-1 , 2*2 : in (3,4)
 
3rd occurrence
(2*3)-1 , 2*3 : in (5,6)
 
4th occurrrence
(2*4)-1 , 2*4 : in (7,8)

Note that 'typerank' is an 'or' not an 'and'. You are
guaranteed to have a number in one but not both:)
For example, for the 1st occurrence the number will
have a 'typerank' of 1 if it is not proceeded by 
non-digits (see 'rowid' 1 above). If non-digits come
first the 1st occurrence is at 'typerank' 2 (see 'rowid'
6 above). The same logic applies to any Nth occurrence.

So now we can put a variable filter in RAC that accepts
the computed coordinates for 'typerank' for any particular
occurrence we want/pass to it (a form of macro substitution
for stored procedure parameter arguments). The filter
is contained in the @wherecounters parameter.

For example, we'll get the rows for the 2nd occurrence
of a number.

declare @N int,@N1 int,@N2 int
-- Only the value of @N is needed. 
set @N=2
set @N1=(@N*2)-1
set @N2=@N*2
--
Exec Rac
@transform='_dummy_',@user1=@N1,@user2=@N2, 
@rows='rowid & type & pos & str1',
@rowsort='rd',
@pvtcol='Sql*Plus',@defaultexceptions='dumy',
@from='##t1',@defaults1='y',@rowbreak='n',@racheck='y',
@rowindicators='type{typerank}',@counterdatatype='int',
--
-- Use coordinates to filter rows. @N1 and @N2
-- will be substituted for @user1 and @user2.
--
@wherecounters='type=1 and typerank in(@user1,@user2)',
@select='select rd,1*rowid as rowid,1*pos as pos,str1
         into ##t2
         from rac
          order by rd'

select * from ##t2

rd          rowid       pos         str1 
----------- ----------- ----------- ---- 
7           1           7           3
8           1           8           1
9           1           9           2
21          2           11          8
22          2           12          3
23          2           13          9
24          2           14          3
44          3           7           5
45          3           8           3
72          6           10          3
73          6           11          0
74          6           12          9
75          6           13          0
76          6           14          2
81          7           3           7

Now we only have to concatenate the rows within 'rowid'
to return the number. This is very simple with Rac.

Exec Rac
@transform='Max(str1) as str',
@rows='rowid',
@pvtcol='pos',
@from='##t2',
@defaults1='y',@racheck='y',
@concatenate='str',@stringname='number',
@separator='',@cutpvt='y'

rowid  number 
------ ------ 
1      312
2      8393
3      53
6      30902
7      7

And there you have it. Of course the whole process can 
be streamlined.

So are you going to get coordinated or are you going to
continue to go around in circles? :)


To paraphrase Gordon Geiko from 'Wall Street':
'Abstraction is good, it clarifies and purifies...'
It allows you to solve a problem, an end instead of
solving a means and then an end. There is something
to be said about qualitatively different solutions.
Precisely what I am unsure, but I think it something
very good.

We need more abstraction in data management. That is
why I am an advocate of Dataphor and the D4 language.
That Rac doesn't have a lot of company is not a good
thing, it's a bad thing. That a company like MS has
a vested interest in not fostering abstraction is a
bad thing. The industry continues to flounder in its
own sea of code and contradictions. That is not a good thing.

Find your way to show another way.

(Added 12/20/2008
 Note to see a much easier solution to this problem using Dataphor see:
'Extracting numbers from a string with Dataphor')
2 comments:

Anonymous said...: This is my first post I'd like to thank you for such a great quality forum!
Just thought this would be a nice way to introduce myself!

Sincerely,
Edwyn Sammy
if you're ever bored check out my site!
[url=http://www.partyopedia.com/articles/sesame-street-party-supplies.html]sesame street Party Supplies[/url].; Sat Jan 09, 07:24:00 AM
Anonymous said...: top [url=http://www.001casino.com/]casino bonus[/url] hinder the latest [url=http://www.casinolasvegass.com/]online casinos[/url] free no set aside bonus at the best [url=http://www.baywatchcasino.com/]no lay gratuity
[/url].; Tue Jan 29, 07:39:00 AM
Saturday, September 02, 2006

RAC - Are you Coordinated?

2 comments:

Beyond Sql

Blog Archive

Links

About Me