Beyond Sql

Thursday, January 22, 2009

Multiple cascade paths to the same table

This example is based on the thread:

microsoft.public.sqlserver.programming
Jan 21, 2009
'JET Referential Integrity superior than SQL'
http://tinyurl.com/d58ubl 

Sql Server can't deal with more than 1 cascade to a table. Some nonsense about
cycles. I guess if two paths are involved they call it a bi-cycle ☺ It simply is
a poor system for relationally modeling a business problem. Simply replace it 
with Dataphor and use sql server to store the data. Sql server doesn't screw
up data storage ☺ Here's how simple this example is with Dataphor (and sql server
as the backend).

The tables will be persisted in sql server but Dataphor will (mercifully) perform all
the integrity checks.

create table users
{
id:Integer,
username:String tags{Storage.Length='50'},
key{id},
key{username}
};

create table table2
{
id:Integer,
user1:String tags{Storage.Length='50'}, 
user2:String tags{Storage.Length='50'}, 
user3:String tags{Storage.Length='50'} ,
key{id},
reference user1username {user1} references users {username}
update cascade delete cascade,
reference user2username {user2} references users {username}
update cascade delete cascade,
reference user3username {user3} references users {username}
update cascade delete cascade
};

users:= table{
              row{1 id,'john' username},
              row{2,'steve'},
              row{3,'judy'},
              row{4,'larry'},
              row{5,'bill'},
              row{6,'rita'}};

This row would violate the user3username constraint.
insert row{12 id,'john' user1,'rita' user2, 'stan' user3} into table2;              
//Error: The table users does not have a row with username "stan".

table2:=table{
              row{10 id,'john' user1,'steve' user2, 'judy' user3},
              row{11,'larry','rita', 'bill'},
              row{13,'larry','steve', 'steve'},
              row{14,'judy','steve', 'john'},
              row{15,'bill','judy','rita'},
              row{16,'steve','bill', 'judy'},
              row{17,'john','bill','john'}};
              
select table2;
id user1 user2 user3 
-- ----- ----- ----- 
10 john  steve judy  
11 larry rita  bill  
13 larry steve steve 
14 judy  steve john  
15 bill  judy  rita  
16 steve bill  judy  
17 john  bill  john  

Check the cascading updates. In table table2 all entries
of 'john' are replaced with 'paul'.

update users set {username:='paul'} where username='john';

select table2;
id user1 user2 user3 
-- ----- ----- ----- 
10 paul  steve judy  
11 larry rita  bill  
13 larry steve steve 
14 judy  steve paul  
15 bill  judy  rita  
16 steve bill  judy  
17 paul  bill  paul  

Check the cascading deletes. Any row in table2 with an entry
of 'steve' is deleted.

delete users where username='steve';

select table2;
id user1 user2 user3 
-- ----- ----- ----- 
11 larry rita  bill  
15 bill  judy  rita  
17 paul  bill  paul  

Dataphor is open source, get a copy here:
http://databaseconsultinggroup.com/downloads/

Saturday, December 20, 2008

Extracting numbers from a string

This example is based on the question asked in the thread:

microsoft.public.sqlserver.programming
Dec 18, 2008
Pulling a number out of an nvarchar field
http://tinyurl.com/3quyap

The OP was interested in pulling out the 1st occurrence of a number in a
string. The string has numbers and letters. So if the string is 'XY40A3' 
we want the number 40 which is the 1st of two numbers in the string.

This is very easy to do in Dataphor. Just as a table of numbers comes in
handy for solving many different problems so does a list of letters. 
The idea here is to treat each letter in the string as a 'delimiter'. We
then split the string using the list of letters as the delimiters so
what results are the number(s) in the string.

We can store the list (the letters of the alphabet) in a temporary table
for easy access.

create session table AlphaTable
 { AlphaList:list(String),
   key{ }
 };
AlphaTable:=
       table
            {
             row
             { {'A','B','C','D','E','F','G','H','I','J','K','L','M','N',
                'O','P','Q','R','S','T','U','V','W','X','Y','Z'} AlphaList}
             };      

For example if we split a string, transform it to a table  and remove any
blanks we'll just have numbers left. If we order by sequence (Index) it will
show the numbers as they occur from left to right. 

select ToTable('XY40A3'.Split(AlphaTable[].AlphaList),'StrNum','Index' )
                where StrNum>' ' 
                  order by {Index};                  

StrNum Index 
------ ----- 
40     2     
3      3     
           
Not only would it be easy to get the 1st number but we can get the
occurrence of any number easily. The 1st occurrence is just a special
case of the general problem of getting the Nth occurrence in a string.

By using ToTable(ToList(cursor by Index (that follows the order of
numbers from left to right in the string) we can create a consecutive rank
from 1 to N over the table of numbers that will allow direct access to the 
Nth number (if it exists).
       
select
 ToTable
        (  
         ToList
               (
                cursor
                      (
                        ( 
                         ToTable('XY40A3RAD853'.Split(AlphaTable[].AlphaList),'StrNum','Index' )
                         where StrNum>' ' 
                         {ToInteger(StrNum) Num,Index}
                        )
                         order by {Index}
                      )   
               )   
        )
         {Num,Index,sequence+1 NthIndex} ;
         
Num Index NthIndex 
--- ----- -------- 
3   3     2        
40  2     1        
853 6     3               
        
Here is the an operator for the Nth occurrence that takes into account
lower case letters and returns a -1 if the Nth occurrence doesn't exist.
        
create session operator NthNum(AStr:String,N:Integer):Integer
begin
var T1:= 
  ToTable
        (  
         ToList
               (
                cursor
                      (
                        ( 
                         ToTable(Upper(AStr).Split(AlphaTable[].AlphaList),'StrNum','Index' )
                         where StrNum>' ' 
                         {ToInteger(StrNum) Num,Index}
                        )
                         order by {Index}
                      )   
               )   
        )
         {Num,Index,sequence+1 NthIndex};
result:=IfNil((T1 adorn{key{NthIndex}})[N].Num,-1);
end;

select NthNum('SF346fs47sGs759 ',1); //returns 346
select NthNum('SF346fs47sGs759 ',2); //returns 37
select NthNum('SF346fs47sGs759 ',3); //returns 759  
select NthNum('SF346fs47sGs759 ',4); //returns -1

Here a table of strings in stored in an Sql Server database from which
we can extract the 1st occurrence of a number.

create table FooStrings
{
 keycol:Integer,
 datacol:String nil,
 key{keycol}
};
FooStrings:=
 table
  {
 row{1 keycol, 'XYZ40AB' datacol},
 row{2, 'WX32A'},
 row{3, '27 blah'},
 row{4, 'A87BNC30'},
 row{5, 'XY40A3'},
 row{6, 'TWFD'},
 row{7, 'XYA53GH5JGV934'},
 row{8, '7'},
 row{9, nil}
  };
  
select FooStrings add{NthNum(IfNil(datacol,' '),1) MyNumber}
                   with {IgnoreUnsupported = 'true'}
                    order by {keycol};
                      
keycol datacol        MyNumber 
------ -------------- -------- 
1      XYZ40AB        40       
2      WX32A          32       
3      27 blah        27       
4      A87BNC30       87       
5      XY40A3         40       
6      TWFD           -1       
7      XYA53GH5JGV934 53       
8      7              7        
9      <No Value>     -1   


Think how easy the operator can be modified if it's desired to return
the minimun or maximum number etc.

If you would like to see a coherent and easy to follow all t-sql solution
that doesn't cobble together string functions into spaghetti code see:
RAC - Are you Coordinated?

But I think you'll agree the much preferred solution is in Dataphor ☺

Saturday, December 06, 2008

Sql server dynamic crosstabs by Jeff Moden

This is the RAC version of Jeff Modens fine article on dynamic crosstabs
for sql server at:

Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs 2008/12/03 
www.sqlservercentral.com/articles/cross+tab/65048/

The data is generated by the method outlined in the article. The table was 
populated with 1 millions rows

- Basic crosstab. 2 secs in QA for S2005.
Exec Rac
@transform='Sum(SomeMoney) as SumMony',
@rows='Someletters2',
@pvtcol='(left(datename(mm,DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)),3)+~ ~+
          cast(year(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)) as char(4))) as mthyr',
@pvtsort='month(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0))', -- Sort pivot expression by an integer. 
@from='##JBMTest',
@WHERE='SomeDate>=~Jan  1 2008 12:00AM~ AND SomeDate<~Jul  1 2008 12:00AM~',
@rowtotalsposition='end',@racheck='y',@shell='n'

Someletters2 Funct   Jan 2008   Feb 2008   Mar 2008   Apr 2008   May 2008   Jun 2008   Totals
------------ ------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
AA           SumMony 685.67     763.64     656.13     575.93     879.13     192.13     3752.63
AB           SumMony 927.06     928.98     280.20     632.43     560.99     785.50     4115.16
AC           SumMony 791.09     555.18     916.71     273.23     187.48     508.31     3232.00
AD           SumMony 250.04     341.58     426.53     645.56     670.13     422.86     2756.70
AE           SumMony 809.14     487.21     295.33     625.92     716.12     527.19     3460.91
.
.
ZY           SumMony 776.32     682.98     326.17     677.69     546.87     926.54     3936.57
ZZ           SumMony 532.75     500.92     277.93     636.40     607.05     553.43     3108.48
Totals       SumMony 433997.25  383211.70  411913.12  411878.29  425431.07  409809.47  2476240.90

Here some additional bells and whistles are thrown in:)
 
-- Executed in 26 secs in QA for S2005.
Exec Rac
-- The same transformed repeated twice for different purposes. 
@transform='Sum(SomeMoney) as SumMony & Sum(SomeMoney) as [% row]',
@rows='Someletters2',
@pvtcol='(left(datename(mm,DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)),3)+~ ~+
          cast(year(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)) as char(4))) as mthyr',
@pvtsort='month(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0))',
@from='##JBMTest',
@WHERE='SomeDate>=~Jan  1 2008 12:00AM~ AND SomeDate<~Jul  1 2008 12:00AM~',
@rowtotalsposition='end',@racheck='y',@pformat='_pvtcols_',@shell='n',@translabel='Summary',
-- Display min and max sum for each Someletters along with pivot (date) it occurred.
-- The min and max are displayed in separate rows. Default is same row for all rowfunctions.
@rowfunctions='min(SumMony) & max(SumMony)',@rowfunctionslabel='Min/Max',@displayrowfunctions='m',
-- Running sum of pivot columns for each row from left to right. The pivot sum is followed by the 
-- run in each cell.
@colruns='SumMony',
-- The percentage of the pivot sum/[row total] displayed in a separate row.
@cpercents='[% row] %only' -- a different transform alias to force a separate row.
                           -- We could display in same row as sum (and column runs).


Someletters2 Summary Min/Max                 Jan 2008            Feb 2008            Mar 2008             Apr 2008             May 2008             Jun 2008             Totals
------------ ------- ----------------------- ------------------- ------------------- -------------------- -------------------- -------------------- -------------------- ----------
AA           SumMony min(192.13,Jun 2008)    685.67/685.67       763.64/1449.31      656.13/2105.44       575.93/2681.37       879.13/3560.50       192.13/3752.63       3752.63
                     max(879.13,May 2008)                                                                                                                                 
             % row                           18.3%               20.3%               17.5%                15.3%                23.4%                5.1%                 -
AB           SumMony min(280.20,Mar 2008)    927.06/927.06       928.98/1856.04      280.20/2136.24       632.43/2768.67       560.99/3329.66       785.50/4115.16       4115.16
                     max(928.98,Feb 2008)                                                                                                                                 
             % row                           22.5%               22.6%               6.8%                 15.4%                13.6%                19.1%                -
AC           SumMony min(187.48,May 2008)    791.09/791.09       555.18/1346.27      916.71/2262.98       273.23/2536.21       187.48/2723.69       508.31/3232.00       3232.00
                     max(916.71,Mar 2008)                                                                                                                                 
             % row                           24.5%               17.2%               28.4%                8.5%                 5.8%                 15.7%                -
AD           SumMony min(250.04,Jan 2008)    250.04/250.04       341.58/591.62       426.53/1018.15       645.56/1663.71       670.13/2333.84       422.86/2756.70       2756.70
                     max(670.13,May 2008)                                                                                                                                 
             % row                           9.1%                12.4%               15.5%                23.4%                24.3%                15.3%                -
AE           SumMony min(295.33,Mar 2008)    809.14/809.14       487.21/1296.35      295.33/1591.68       625.92/2217.60       716.12/2933.72       527.19/3460.91       3460.91
                     max(809.14,Jan 2008)                                                                                                                                 
             % row                           23.4%               14.1%               8.5%                 18.1%                20.7%                15.2%                -
AF           SumMony min(406.49,May 2008)    788.30/788.30       415.40/1203.70      605.56/1809.26       613.81/2423.07       406.49/2829.56       520.40/3349.96       3349.96
                     max(788.30,Jan 2008)                                                                                                                                 
.
.
ZY           SumMony min(326.17,Mar 2008)    776.32/776.32       682.98/1459.30      326.17/1785.47       677.69/2463.16       546.87/3010.03       926.54/3936.57       3936.57
                     max(926.54,Jun 2008)                                                                                                                                 
             % row                           19.7%               17.3%               8.3%                 17.2%                13.9%                23.5%                -
ZZ           SumMony min(277.93,Mar 2008)    532.75/532.75       500.92/1033.67      277.93/1311.60       636.40/1948.00       607.05/2555.05       553.43/3108.48       3108.48
                     max(636.40,Apr 2008)                                                                                                                                 
             % row                           17.1%               16.1%               8.9%                 20.5%                19.5%                17.8%                -
Totals       SumMony min(383211.70,Feb 2008) 433997.25/433997.25 383211.70/817208.95 411913.12/1229122.07 411878.29/1641000.36 425431.07/2066431.43 409809.47/2476240.90 2476240.90
                     max(433997.25,Jan 2008)                                                                                                                              
             % row                           17.5%               15.5%               16.6%                16.6%                17.2%                16.5%                -

Tuesday, November 25, 2008

Linq to Sql as an opportunity to sql users

Instead of a dead end, Linq presents opportunities for sql experts.

My comment orginally appeared in the www.sqlblog.com post:

'Continuing Industry Consolidation and Other Trends.' 1/16/08
by Kevin Kline
http://tinyurl.com/5opxlz

I think the Sun acquisition of MySql was a checkmate move. It was Oracle
that attempted to slow down MySql with their acquisition of the InnoDB
transaction engine. Now they have been paid back in spades. Now it's Sun
vs. MS in a new competition for the minds and hearts of developers. The
broad view is LINQ/net and Sql Server vs. java and MySql. This is not
about databases per se but a new war based on the object/entity model
inspired by LINQ. I don't see IBM and Oracle in this war. They will have
to be content to battle amongst themselves in the old database wars.
(I'll come back to LINQ:)

As for 'Checking out the Competition' of course I too applaud Linchi.
But honestly we're first 'now' recognizing the advantages of eyes wide
open?:) This attitude should be a given for a professional. Perhaps
better late than never, perhaps an airing of cheerleading vs.reality
checking:) For most ansi features that MS has pre-annouced all one has
to do to study them is read about them in Oracle or DB2 documentation
(ie. OVER). And as Linchi pointed out it often goes the opposite
direction for other types of features. This attitude contributes to the
great divides that are common in the industry. 

And now we come to the 'dead-end routes' like LINQ. I take the opposite
view you do. There's a compelling case to be made (I've commented on
this matter on this site) that if there is a deadweight MS sees sql as
it. LINQ is not just a piece of new technology, it's not just a local
variable, it's a global one. LINQ is both an affirmation of the runtime
environment for 'all' application development using an object model and
a rejection of the sql interface. MS can live with the physical file
structure (the idea of relational data, rows and columns) but they don't
want to live with the sql interface for application development. MS
explains this global move in terms of solving the historic impedance
mismatch between two completely different environments. And they have
picked their winner and at the same time the loser. The rows and columns
abstraction now ends at physical storage. The object abstraction and
LINQ will take up everything else. Sql server is now something quite
different than it used to be. Almost all developmental changes in server
will be based on servicing objects and quite probably at the expense of
features associated with a furtherance of the relational model. Look at
all the work on partitioned views in S2008. This lies at the heart of
how LINQ will translate entity updates. LINQ is still in its enfancy. I
would expect it to appear to many just like sql did when it was intially
introduced in the early eighties. It will take time to get the matured
version. What is truely ironic is I see no real argument in the sql
community that LINQ represents a great opportunity for sql developers.
MS is inventing a declarative language in the spirit of sql. Don't
people see an opportunity to jump in and at least influence the design
of the language? Or get involved in the LINQ translation/optimizations
to sql. Over time as MS integrates LINQ deeper into server (returning
entities) I can assure you the current translations will change:) Sql
was most certainly not an implementation of the relational model. So sql
folks shouldn't get hung up over this. The relational model would
require the same strong typed runtime as net but MS is certainly not
going there. But they are going to a place that sql skills can be used.
And now Sun is going to go along with them. It's actually a small world
if your eyes are open:)

Linq to Sql should be important to sql users

My plea for the sql server community to look closely at Linq.

My comment orginally appeared in the www.sqlblog.com post:

'PASS presentation: What would you like to hear about?' 2/23/08
by James Luetkehoelter
http://tinyurl.com/5lmj4m

Your eccentric. Such people tend to be really bright. So that's kewl.
Your passionate, so your highly motivated. That's very good. If you're
'over the top' that means your not afraid of taking risks! That's best
of all. If all this is true you can take on the burden of tackling LINQ
and the entity framework. Now I'm not talking about the 'how' of it. I'm
sure there will be plenty of people presenting point and click slides.
What I am talking about is the 'why' and 'what' of it. LINQ/EFM dwarfs
everything else in terms of importance in S2008. It's a reflection of a
movement in a different direction from what everyone is used to. It's
also a reflection of a change in attitude. When I look for sql folks
tackling the subject what do I find? Frighteningly little or nothing!
Now let me say if you're willing to make the case that sql folks can
simply ignore the subject, have at it:) If you even want to make the
case that it simply represents an isolated piece of technology go for
it. Some sql folk are waxing nostalgic about past 4GL/xbase languages
when discussing LINQ. So it may be that many sql folks think it's quite
the same idea and wonder why it's structure is different and possibly
inferior to what was. Well LINQ is different, it's comparing apples and
oranges, and it can't possibly be the same. But how can folks really get
a grasp of such a comparison if they don't really understand the 'what'
of LINQ. Trying to explain 'what' it is isn't easy especially to those
sql folks who are not familiar with net or any other contemporary
strongly typed object oriented runtime environment. I think MS is
finding out that it's a challenge. Even trickier is to explain the 'why'
of it. The motivation for it. Surely it didn't come out of thin air:)
And the 'why' involves taking risks. You may frighten or alienate
people:) The 'why' cuts to the heart of what most folks believe in. LINQ
is a departure. It will have significant implications for what
development is and will be like in the future. It will take a very
unique person to put all these ideas together so they'll be really
understood. Interestingly, all the answers about LINQ are right on the
MS website. All that's required is to dig around and back thru the years
to find them. With over eight years of development this shouldn't be
surprising:) But how many sql folks have bothered to do this homework?
From the looks of it very, very few. Presenting concepts is much harder
than presenting code. It takes a very special kind of guy to connect
the dots here:)
I'd be happy to share with you an extensive MS list of articles/notes
thru the years about the subject.

Linq to Sql vs. older 4GL attempts

Some thoughts on comparing Linq to older reporting languages.

My comment orginally appeared in the www.sqlblog.com post:

'LINQ - Lessons that could have been learned from languages like Progress' 2/25/08
by Greg Low 
http://tinyurl.com/56powf

>it's still very much an add-on rather than being fully baked into the language.

I remain somewhat perplexed by just what you mean by this. By definition
LINQ is 'burned in'. This is the achievement that MS has accomplished.
To imply that it's not fully baked in is like saying a woman is somewhat
pregnant. It is or it isn't, you are or you are not:) Either a
table/entity or projection can be represented as a variable or it
cannot. That all these things 'can' be a variable of some type
referenced in the language is the whole point of it no? Your use of
'add-on' seems to imply that LINQ is something external to the language
much like an external 4GL like Progress. I don't think this can be
further from the truth. In your interview with Pablo Castro he referred
to Progress as an example of a language having 'direct line of sight
from language to database'. Wasn't he struggling here to convey the idea
to sql folks of the fundamentally different natures of the two? To bring
the 4GL idea into contemporary languages one shouldn't expect they are
going to be similar. And you seem to be implying that LINQ is not as
'natural' as Progress. How could it be? If you first have to define a
query variable (table) you certainly can't expect to start your variable
with a Select/Find. You define and then perform an action right? In
t-sql 'Select @MyNumber' only makes sense if your first declare
@MyNumber. Is LINQ any different? And in the sense that 'declare
@MyNumber int' is burned into t-sql, isn't 'var customers = from c in
db.Customers' burned into C#?

I do think sql users should listen to your interview with Pablo. It is
proving difficult for MS folks to convey just what they have done to
those outside of net. What is worse, misunderstanding it or ignoring it?:)
Shouldn't sql folks fully understand why MS thinks it's such an
improvement over sql? So far I think either they don't or simply don't
care.

Linq to Sql: the 'what' and 'why' of it

The following comments concern 'what' is Linq (to Sql)/EF and the 'why'
of it, what motivated MS to develop it. What does MS mean by 'beyond 
'relational'? I also explore in what ways Linq, sql and the relational
model are related to each other. How these technologies are connected
to each other is quite a fascinating picture:)
My following 5 comments orginally appeared on www.sqlblog.com
in the post:

'Beyond Relational ???' 10/29/07
by Paul Nielsen 
http://tinyurl.com/686z6h

Comment #1

There is the association of relational to mathemetics (set theory). So
people criticize sql based on this point of view. Sql allows duplicates
rows, doesn't require a table to have a key, dependencies based on
ordinal position, is a poorly designed language etc. etc. These things
really are critical but the real problem is the prevailing idea that
relational is just a question of mathemetics. If it's just mathemetics
then allowing duplicate rows is perceived as 'close enough'. All the
objections from the set theory point view are not perceived as
compelling enough to really question the validity of sql. IMO the real
holes of sql have nothing to do with mathemetics. Rather it's the
foundation, the computer science if you will, that set theory and
relational algebra are embedded in. This point of view is unfortunately
not prevalent in IT. What the hell do I mean by the computer science of
the relational model? Well first, the set theory that relational depends
on is not some special kind of set theory. There is only one set theory.
In the same way there is only one computer science, there is no special
kind of computer science. But sql has invented such a special computer
science and this is the biggest flaw. What am I talking about?:)
Consider this, here is a table variable:

DECLARE @MyTableVar table(
   EmpID int NOT NULL primary key,
   OldVacationHours int,
   NewVacationHours int,
   ModifiedDate datetime);

Here is a server table:

create MyTable
      EmpID int NOT NULL primary,
      OldVacationHours int,
      NewVacationHours int,
      ModifiedDate datetime);
 
Here's the key question. If @MyTableVar really is a variable then what
is MyTable? In other words, @MyTableVar is to variable as MyTable is to
?????. If MyTable is persisted in the database what is it persisted as?
What computer science term describes it? Well whatever the hell it is (a
constant?) it certainly isn't a variable. And if it isn't a variable
then end of ballgame, end of relational model. And what of @MyTableVar?
Bol says 'A table variable behaves like a local variable.' and at the
same time says 'Assignment operation between table variables is not
supported.'. When is a door not a door?..when it's ajar:) Who the hell
ever heard of a variable that doesn't support assignment? Who ever heard
of a variable that doesn't support comparison? No one. Whatever
@MyTableVar really is it sure as hell ain't a variable. In a relational
db I should be able to assign the table @MyTableVar, all its rows, to
MyTable: 
      
MyTable=@MyTableVar

And I should be able to compare them.

if MyTable=@MyTableVar
 then print 'All rows in MyTable are in @MyTableVar and all rows in   
  @MyTableVar are in MyTable'
   else print 'Nope they're not equal'      
   
A relational db demands a table be a variable just like an integer
variable. Sql simply does not support basic computer science for tables.
Whatever a table is in sql it doesn't have a 'type' because computer
science is computer science and a variable must be typed. The only way
sql can recognize a table is by its name, not its type. This is why sql
doesn't support relational division and why dynamic sql must be used so
much. A table as a variable is a completely different animal than a
table in sql. This is why the expressive power of a relational db is
orders of magnitude greater than an sql db. Sql views and constraints
are redefined relationally. The 'types' in Dates work:   
Databases, Types and the Relational Model, The Third Manifesto' 2007
is about the central importance of variables of a particular type (a
table as one of many types) in a relational db. What a table as a
variable means and its significance. It is really a basic computer
science book. Ripping out the mathematics of relational theory (at least
trying to copy it), ie. the syntax to join, union tables, without the
computer science of relational has done all the damage. MS can't change
sql server because they are caught in an crazy computer science. The
difference in computer science between sql and net is the impedance
mismatch they're trying address. But I'm afraid they still don't get the
idea of a table as a variable. This is different than a table as a
class. The anonymous type in Linq is distinctly different
than a table type. So MS is doing the same thing IBM did forty years ago
with the sql System-R. The damage is the difference between a pickup
game in a playground and organized sports. You can draw up plays in the
dirt but they don't quite work the same as those run in a stadium. We're
still doing application development in the playground. Sometimes it
works, sometimes it doesn't but we're not basing it on the science of
any model. Sql is not a model of anything, it's an invention all its own. 
Close enough is only for horsehoes:) Maybe my blog will make more sense now:)

Comment #2

Wherever you see the word 'relational' just substitute 'fog'. As in fog of war:) 
> But when you have guys like Don Chamberlin (co-inventor of SQL and  
> co-designer of XQuery) on your staff, I guess you can afford to 
> boast your XML prowess.
He is revered in the sql world and reviled in the relational one. He was
a lead designer of System-R, the prototype of all sql database systems.
Those guys created a query language based on Codds description of basic
relational operators like projection, union and join. But they did NOT
implement the relational model Codd described. They just ripped out
these constructs without regard for their meaningfulness in the entire
relational model. So what you have today is nothing like the relational
model as it was envisioned. (IT successfully marginalizes the huge
difference and those that point it out:) And now comes 'beyond
relational'. What does this phrase really mean to MS? They are more than
willing to tell us. Aside from Jim Grays article/presentation, everyone
should read the articles on this site, the 'Comega language':
http://research.microsoft.com/Comega/
Especially this article:
'Unifying Tables, Objects and Documents'
http://tinyurl.com/yq7c4f

Here you'll find history repeating itself. MS, just like IBM did with
System-R, has extracted relational operators out of the relational model
and put them in an imperative object environment without any regard to
relational theory. The great irony is that the extensions that MS added
to net to realize projections of columns and xml within net is the
foundation for a true relational model! But the compiler generated
anonymous type of Linq while a variable is a different beast than the
explicit variable that is a table type in the relational model. It's the
relational variable that supports assignment and comparison as opposed
to the Linq variable that's no where near as smart:) But each supports a
'variable' which is a major step up from sql. Had MS any idea of the
friggin true relational model they would make a different kind of
history. Talk about dumbing down. Talk about of only academic interest.
Talk about relational fog (I should add that Alphora (Dataphor)
recognized the ability of the object imperative environment to support
the D relational language and implemented it. And it works:) 
Here is what Anders Hejlsberg, MS VS guru, and now the head of database
technology has to say about the disconnect:
InfoWorld
Interview of Microsoft Distinguished Engineer Anders Hejlsberg
'Microsoft's Hejlsberg touts .Net, C-Omega technologies'
June 10, 2005
http://www.infoworld.com/article/05/06/10/HNhejlsberg_1.html

"So what we're looking at is really trying to much more deeply integrate
the capabilities of query languages and data into the C# programming
language. And I don't specifically mean SQL, and I emphatically don't
mean just take SQL and slap it into C# and have SQL in there. But rather
try to understand what is it expressively that you can do in SQL and add
those same capabilities to C#."

Anders Hejlsberg is microsofts version of Don Chamberlin at IBM. So what
they have done is replace one flawed implementation of sql with another.
And this is how they achieve efficiency in application development. Now
that is unfriggin believable:) Well there's no free lunches. And I await
to be enlightened on just how this environment will replace the concept
of the logical relational model in solving business problems. I would
say the real meaning of beyond relational is sideways.

Comment #3

Consider the MS whitepaper:
'Microsoft SQL Server 2008 and Microsoft Data Platform Development'
http://www.microsoft.com/sql/techinfo/whitepapers/sql_2008_dp.mspx

Does anyone find it the least bit odd that an sql server technical article
is all about VS, LINQ and the entity framework? At the expense of the logical
relational model and the sql language.

What MS means by 'beyond relational' is 'forget relational already':)
Looking at sql server as if it was somehow an embodiment of relational
theory is every bit a form of dumbing down as some silly utterance by
some poor nitwit at MS. There never was and never will be any 'intent'
by MS to offer a 'relational' database. Sql servers only intent now is
to be responsive to its biggest customer, visual studio. And that team
is as knowledgeable in relational databases as the server team. Not. Why
does the community still view sql server thru an imaginary lense? Did
you ever hear of somewhat pregnant? If you open the dumbing down door be
prepared to greet all those who come thru:)

Comment #4

There is no longer a great divide, a debate, an impedance mismatch. MS 
has issued their own Emancipation Proclamation. And as a result they no
longer support the relation model as it is know to developers today.  
'A Call to Arms'
by Jim Gray, Microsoft
Mark Compton, Consultant
April 2005
http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=293

This paper is an invitation to embrace a new model. It's just as much
'A Farewell to Arms', an emancipation from the relational model which 
they are leaving behind. What does sql server look like in this new model?

'Interview with David Campbell'
General Manager of Strategy, Infrastructure and Architecture of Microsoft SQL Server.
May 14, 2007
http://tinyurl.com/6maseb
Campbell says:
"I believe the next major advance in Information Technology will come 
from addressing the gap between people and information."

That gap is the relational logical model itself.

Campbell continues:
"The focus will move to the data itself rather than on the machinery
used to manipulate it. We'll be less concerned with the plumbing and
more concerned with data quality, data protection, and information
production."
"Most of the data services provided by SQL Server will be driven from
a common data model. Whether you're creating a report, building an
information cube, or integrating data from another system, you will be
able to start from a common model of the key data entities such as
'customer', 'order', or 'prospect'."
"Finally, fewer and fewer people will miss, (or remember), the 'open
databases' sp_configure option..."

The class replaces the table as the basic unit of work.  VS replaces
QA/SSMS as the interface for application development. There is no
concept of relational anything in this object world. Sql constructs are
independent of application development. The language of the relational
model is replaced with the language of entities. There is no concept of
a dba.
MS is no longer in the database wars as we know it. They are trading 3rd
place in that world for 1st place in another. And they now have the
freedom to talk about this new world. It just sounds silly to those who
have not left the old one.
Ironically some were hoping for a new sub-language to further AD. Perhaps
the lesson here is to be careful of what you wish for. I too was hoping
they'd enter a new world but not the one they have chosen.

Comment #5

> should we be concerned staying in the DB world long with the fear 
> that we become obsolete one day?
Although I'm not an expert I can understand where you're coming from. It
would be nice to get a clear and concise answer to where MS is going and
what you should do about it. But there is no Oracle when it comes to MS.
There is no one position paper, no one person that clearly lays out
their five year plan and what it means to you. The experts here have
enormous importance and influence in the db community. But they also
have an enormous investment. How far can they be reasonably expected to
go without putting themselves in an awkward position should they take a
position that is not currently in line with company thinking? In the end
it's a question of connecting the dots. You get a dot here a dot there.
You have to do your homework. Study what they say and write and study
what they offer. Sql server pros shouldn't neglect what's going on in VS
and it's impact. If you study the company and the various technologies
enough you should be able to draw your own picture. Think of it as the
MS X-files:)

Sunday, November 23, 2008

Listing Missing Table Item

The following example is based on the post:

microsoft.public.sqlserver.programming
Friday, November 21, 2008 11:56 AM
"T-SQL Comparing a String of Items to a Table of Items; Listing Missing Table Items"
http://tinyurl.com/5dvc6o

Here's what the future of application development will hopefully be like
using a relational system. To program relationally you not only have to
think in "terms of sets" (the declarative thing:) but you have to think
"in terms of type" (the relation thing). Sql essentially doesn't concern
itself with variables and types. They are essentially viewed as foreign
thingies found in net. But in a relational system they are every bit as 
significant as understanding how to write an outer join. The following
example illustrates the relational approach using the Dataphor system
(which can play nicely with Sql Server ☺)

create session table Products {
  product_key:Integer,
  part_no: String,
  options: list(String), //options is defined as a list type, not a string.
  price: Money,
  key{product_key}
  };

All tables, like numbers, are variables with an explicit 'table' type, the
column names and their datatypes. And like integers they can be 'assigned'
values. Tables are composed of 'rows'. For each row options is input as a
'list' of type string ({'<string>','<string>','<string>'..}).

Products:=
table {
      row{11 product_key, 'ASP-20J' part_no, {'Wiz Bang', 'Hopla Enhancer'} options, $10.00 price},
      row{12, 'ASP-20R', {'Widget Holder','Wiz Bang',  'Hopla Enhancer'}, $12.00}
      }; 

create session table Options {
  option_key: Integer,
  product_key: Integer, 
  option: String,
  key{option_key},
  reference OptiontoProduct {product_key} references Products {product_key}
  };

Options:=
table {
       row{5 option_key, 11 product_key, 'Wiz Bang' option},
       row{6, 11, 'Hopla Enhancer'},
       row{7, 12, 'Wiz Bang'},
       row{8, 12, 'Hopla Enhancer'}
      };
       
Here are easy ways to find the missing product from the Options table.

Using row in table. First a table of numbers (Numbers) with a single
column N (from 0-10K) is used to create a row for each element in
the options list bringing along the other Products columns. The row
is constructed from this table to see if it's not in options. 

select 
   (
    Products 
       times  //Like an sql cross join. 
         Numbers
           where N<options.Count()
              //create a row for each element in the list.
             {product_key,options[N] option} 
                                           
    )     
       where not (
                  row{product_key product_key,option option}
                  in
                 (Options {product_key,option})
                 ) ;
                
Using relational division. Because tables (table expressions) are
variables one table can be tested to see if it's contained in another.

select 
   (
    Products times Numbers
         where N<options.Count()
          {product_key,options[N] option}
    )     
       where not (  
               table{row{product_key product_key,option option}}
                <=
               (Options {product_key,option})
               ) ;                 
               
Using a left join. Test whether the table on the right has a matching row.

select 
   (
    Products times Numbers
         where N<options.Count()
          {product_key,options[N] option}
    )     
      left join Options
                include rowexists  //A special Boolean column for whether there is a match.
                  where not rowexists 
                    {product_key,option} ;
      
Instead of inputting a list directly, a delimited string can be converted to
a list when input.

Products:=
table{
      row{11 product_key, 'ASP-20J' part_no, ('Wiz Bang, Hopla Enhancer '.Split()) options, $10.00 price},
      row{12, 'ASP-20R', (' Widget Holder ,Wiz Bang,  Hopla Enhancer'.Split()), $12.00}
      }; 
Options:=
table{
      row{5 option_key, 11 product_key, 'Wiz Bang' option},
      row{6, 11, 'Hopla Enhancer'},
      row{7, 12, 'Wiz Bang'},
      row{8, 12, 'Hopla Enhancer'}
     };

The queries are the same as above except for trimming each element of the list.

select 
   (
    Products times Numbers
         where N<options.Count()
          {product_key,options[N].Trim() option}
    )     
     where not (
                row{product_key product_key,option option}
                in
               (Options {product_key,option})
                ) ;
                
select 
   (
    Products times Numbers
         where N<options.Count()
          {product_key,options[N].Trim() option}
    )     
     where not (  
               table{row{product_key product_key,option option}}
                <=
               (Options {product_key,option})
               ) ;                 
               
select 
   (
    Products times Numbers
         where N<options.Count()
          {product_key,options[N].Trim() option}
    )     
      left join Options
                include rowexists 
                  where not rowexists 
                    {product_key,option};
                    
All queries produce a result of:

product_key option        
----------- ------------- 
12          Widget Holder 

Types eliminate violations of normal forms much like education eliminates ignorance ☺

Dataphor is a RAD tool, R(elational) A(ccelerator) D(evelopment).
Visit dataphor at:
www.dataphor.org

Friday, November 14, 2008

S(ecure) Q(uery) L(anguage)?

Concerning the thread:
microsoft.public.sqlserver.programming
Nov 10, 2008
"cascading deletes"
http://tinyurl.com/6nwjmd

For some reason I couldn't get my reply to post thru OE (it got thru
via google though). Perhaps there's an MS filter for metaphors ☺
In any event any mature adult should be able to handle it. So here's
my reply with a touch of creative writing ☺

'Most women believe men think with their tool. And it's just as true in
IT. Users model business problems in terms of the abilities of their
db. The idea that modeling exists independent of ones db is a myth.
It's not a question of seepage but of flooding. Modeling business
problems in terms of the available sql server constructs is messy
precisely because their immature and superficial to the task. The
result is you turn away from the db and towards the do-it-myself
model. You roll around in your own layer because you can't get layered
by the db. It's ridiculous to write a join procedurally but when it
comes to modeling it's perfectly acceptable to roll your own. Because
the model equivalent of the join is so lacking and messy. The genie
isn't going back in the sql server bottle. It's simply to far gone. 
That's why I advocate Dataphor. There the genie is in the join as well
as the modeling. Use Dataphor and put your tool back where your head
and shoulders are. You can still use sql server. But you aren't going
to get tooled by it :)'

www.dataphor.org
www.beyondsql.blogspot.com

Geoff Schaller wrote:
> Andre.

> I vote with Hugo here. We manage everything from code, not from TSQL in
> SSMS or via some other mechanism so we generally have to code everything
> (and that is not as difficult or as expansive as it sounds). Whilst
> cascading referential integrity is "nice" from a simplicity point of
> view, we've found that the act of deleting something (say an invoice) is
> almost never a simple thing. There is reversal of stock levels,
> rebalancing totals and if others are running reports when you thought
> you wanted to do the delete, it gets messy.

> The other thing is that we quite often have to delete the child entries
> individually or prevent the parent from being deleted because a child or
> two cannot be. Writing all that logic into a trigger and enforcing the
> rollback is quite complex. I find code an easier way to manage and
> maintain this.

To add insult to injury my reply to a post on SQLServerCentral was hacked (edited).

SQLServerCentral 
11/10/2008
'Arrays in Stroed Prcoedure'
http://tinyurl.com/5th5n4

My reply, as shown there under the name rog pike, was edited to read:  

'An array is a 'type', something the archaic computer science of sql knows
nothing about. You have to move to a 'real' relational system to find a
list/array type. You'll find such adult computer science in Dataphor.'

My orginal reply was a follows:

'Arrays are in sql server in the same sense as having sex by yourself which may account for
the shortsightedness of so many sql mavens. An array is a 'type', something the archaic computer
science of sql knows nothing about. You have to move to a 'real' relational system to find
a list/array type. You'll find such adult computer science in Dataphor.'

Is the site for mature adults or for the whole family? ☺ Just how much protection
does sql and its users need? Is this a security or, better yet, an insecurity problem? ☺

Finally, I'll repeat here what I posted in the above thread:

'Apparently someone complained/reported something I wrote as being objectionable. They
got their wish as it was magically extracted from the text. What was yanked, besides
my chain, was a metaphor, albeit a vivid one, to drive a salient point home. Now I
write for adults, I don't do child-speak very well. Nor do I have a predilection 
to only write drone-on-speak. So, if I can, I won't hesitate to use an adult metaphor
to amplify a point in an industry that is usually tone deaf. God forbid IT encourage
ability in something other than code or pixels. So if you are an adult, with a surname
other than anonymous, please explain just what you found R or X rated. Mature adults
usually confront conflicts thru the front door not the back one.'

Monday, September 08, 2008

Sorting a delimited string numerically

Sorting a delimited string by its numerical string parts.

This is a possible solution to the problem presented in the article:   

'T-SQL Sorting Challenge'
By: Itzik Ben-Gan 
http://www.sqlmag.com/Article/ArticleID/100156/100156.html

"You are given a table called t1 with a character string column called val. 
Each string in the val column holds a dot separated list of integers. Your
task is to write a T-SQL solution that sorts the strings based on the integer
values constituting the string segments. Note that the number of integers in
each string may vary, and is only limited by the column type VARCHAR(500).
Extra points if your solution will also support negative integers."

So the problem is how to construct a sort expression that represents the
positive and negative integers of the ids.

This solution uses Dataphor with the data being stored in Sql Server 2005.

Sample data

The data is populated thru dataphor and persisted in the sql server northwind
db. The table t1 in the article is table IZ here. The article uses an sql
identity for the primary key id, here it is explicitly declared.

create table IZ
{
 id:Integer,
 val:String tags {Storage.Length='75'},
 key{id}
};
//Positive value strings.
insert row{1 id,'100' val} into IZ;
insert row{2 id,'7.4.250' val} into IZ;
insert row{3 id,'22.40.5.60.4.100.300.478.19710212' val} into IZ;
insert row{4 id,'22.40.5.60.4.99.300.478.19710212' val} into IZ;
insert row{5 id,'22.40.5.60.4.99.300.478.9999999' val} into IZ;
insert row{6 id,'10.30.40.50.20.30.40' val} into IZ;
insert row{7 id,'7.4.250' val} into IZ;
//Add negative values.
insert row{8 id,'-1' val} into IZ;
insert row{9 id,'-2' val} into IZ;
insert row{10 id,'-11' val} into IZ;
insert row{11 id,'-22' val} into IZ;
insert row{12 id,'-123' val} into IZ;
insert row{13 id,'-321' val} into IZ;
insert row{14 id,'22.40.5.60.4.-100.300.478.19710212' val} into IZ;
insert row{15 id,'22.40.5.60.4.-99.300.478.19710212' val} into IZ;

Go directly to dataphor solution
Go directly to a solution using dataphor and sql server t-sql
Go directly to a solution using the Rac utility on sql server 2000
Go directly to a solution using the Rac utility on sql server 2005
(
 Rac is a system of stored procedures and functions for sql server designed
 to simplify solving various data manipulation problems including dynamic
 crosstabs, complex running sums and ranking, string manipulations etc.
)

Stepping thru the logic of the solution

Use the dataphor Split operator to split the val string for each id into
individual strings starting at Index 1 and going to the number of parts
delimited by the period ('.'). Note that the string is converted to an
integer. So we're dealing with numbers and not strings.

select
 (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
    times                //times is equivalent to an sql CROSS JOIN.
    (Numbers where N<10) //A table with a single column N, an integer from 0 to 800.
      where N<StrListCnt
       {id,N+1 Index,ToInteger(StrList[N]) StrNum}
         order by {id,Index} ;      
         
id Index StrNum   
-- ----- -------- 
1  1     100  <- id 1 has only a single value.
2  1     7    <- id 2 has 3 values.
2  2     4        
2  3     250      
3  1     22   <- id 3 has 9 values.
3  2     40       
3  3     5        
3  4     60       
3  5     4        
3  6     100      
3  7     300      
3  8     478      
3  9     19710212     
4  1     22  
.  .     .

Now lets look at the same data but within each Index and within each Index
ordered by the string as a number (StrNum). Remember all ordering is ascending.

select
 (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
    times
    (Numbers where N<10)
      where N<StrListCnt
       {id,N+1 Index,ToInteger(StrList[N]) StrNum}
         order by {Index,StrNum};
         
id Index StrNum   
-- ----- -------- 
13 1     -321  <- id 13 has lowest overall number for 1st string part (Index 1).
12 1     -123     
11 1     -22      
10 1     -11      
9  1     -2       
8  1     -1       
2  1     7        
7  1     7        
6  1     10       
3  1     22    <- ids 3,4,5,14,15 have the same value (22) for Index 1.   
4  1     22      
5  1     22       
14 1     22       
15 1     22       
1  1     100   <- id 1 has highest overall number for 1st string part (Index 1).  
.  .     .
3  8     478   <- Index 8 has the same string value for the 5 ids that have an 8th string part.
4  8     478      
5  8     478      
14 8     478      
15 8     478      
5  9     9999999  
3  9     19710212 
4  9     19710212 
14 9     19710212 
15 9     19710212 

The strings as integers are nicely sorted within each Index over the ids. 
How can we represent the same ordering within each Index independent of
the positive and negative numbers (and strings that indicate positive and
negative numbers)? What about with a ranking. So lets rank the numbers
within each Index. The combination of the dataphor ToTable, ToList and
cursor operators will generate a rank (a column named sequence) that 
follows the specified cursor ordering. We order the cursor by Index,StrNum
to get an ascending rank within each Index based on the StrNum values. 
The rank is column RowNum.

select
ToTable(
        ToList(
               cursor(
                       (
                        (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
                          times
                           (Numbers where N<10)
                             where N<StrListCnt
                              {id,N+1 Index,ToInteger(StrList[N]) StrNum}
                        )   
                                order by {Index,StrNum}))) 
                                 {Index,id,StrNum,sequence+1 RowNum}
                                   order by {Index,StrNum};

Index id StrNum   RowNum 
----- -- -------- ------ 
1     13 -321     1  <- id 13 has lowest value (-321) so id 1 gets lowest rank (1) within Index 1.    
1     12 -123     2      
1     11 -22      3      
1     10 -11      4      
1     9  -2       5      
1     8  -1       6      
1     2  7        7  <- ids 2 and 7 get a different rank for the duplicate values of 7 :(    
1     7  7        8      
1     6  10       9      
1     3  22       10 <- all 5 ids get a different rank for the duplicate values of 22 :(  
1     4  22       11     
1     5  22       12     
1     14 22       13     
1     15 22       14     
1     1  100      15 <- id 1 has highest value (100) so id 1 gets highest rank (15) within Index 1.
.     .  .        .
8     3  478      56     
8     4  478      57     
8     5  478      58     
8     14 478      59     
8     15 478      60     
9     5  9999999  61     
9     3  19710212 62 <- all 4 ids get a different rank for the duplicate values of 19710212 :(    
9     4  19710212 63     
9     14 19710212 64     
9     15 19710212 65     

(
 Note that this rank is equilvant to the sql ranking function ROW_NUMER().
 The rank could be obtained in sql using:
 ROW_NUMBER()OVER(ORDER BY [Index],StrNum) AS RowNum
 But the rank we want is really based on the sql RANK() function which
 accounts for duplicate/ties (of StrNum) by giving them the same rank. 
 Therefore it's necessary in dataphor to use a join to append the correct
 ranks to the table. In sql the join isn't necessary, RANK() can be used
 directly on the table:ie.
 RANK()OVER(ORDER BY [Index],StrNum) AS Rank
 (See t-sql solution)
 For more on dataphor and sql ranking see:
 'The Sql ranking OVERture' 
 http://beyondsql.blogspot.com/2008/04/sql-ranking-overture.html
)

If you haven't guessed it by now ☺ the idea is to create a string for each
id based on the ranks which will be used to order the ids. But we have a 
problem because for duplicate values of a number we're getting different
ranks. We want the 'same' rank for duplicate values since the same integer
cannot be used to distinguish among the ids. 

We can remedy the different ranks for duplicate values by simply choosing
the minimum rank (RowNum) for the value and assigning this rank to all ids.
Also note that the ranks continue to ascend over the Indexs. This is ok
because any numbers representing the ranks are ok if they correctly maintain
the ordering of integer strings values within the Index.

select
ToTable(
        ToList(
               cursor(
                       (
                        (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
                          times
                           (Numbers where N<10)
                             where N<StrListCnt
                              {id,N+1 Index,ToInteger(StrList[N]) StrNum}
                        )   
                                order by {Index,StrNum}))) 
                                 {Index,id,StrNum,sequence+1 RowNum}
                                   group by {Index,StrNum} add{Min(RowNum) Rank} 
                                     order by {Index,StrNum};

Index StrNum   Rank 
----- -------- ---- 
1     -321     1    
1     -123     2    
1     -22      3    
1     -11      4    
1     -2       5    
1     -1       6    
1     7        7  <- a rank of 7 can be assigned to the two ids (2,7) with a value of 7 for Index 1.  
1     10       9    
1     22       10 <- a rank of 10 can be assigned to all ids with a value of 22 for Index 1. 
1     100      15   
.     .        .
8     478      56   
9     9999999  61   
9     19710212 62 <- a rank of 62 can be assigned to all ids with a value of 19710212 for Index 9.

What we now have is a table of unique Index/StrNum combinations with a
unique rank for each combination. It's only necessary to join this table
to the table of split strings (IZ) for all ids by Index and StrNum to properly
assign the correct(ed) ranks. (As mentioned above this is the same rank that
would be obtained using the sql RANK() function and ordering by Index,StrNum.
And note that using the sql RANK() would eliminate the need to do a join in 
dataphor. Imagine dataphor with native sql like ranking operations ☺ ) 

Because the objective is to create a string to sort the ids we can't just
use the numeric rank, we have to modify it for string ordering. Given two
ranks of 7 and 11 if they are strings, '7' and '11', an ascending sort
would have '11' come '7':

'11'
'7'

This is the very problem the article is addressing! So we have to modify the
the strings to have '7' come '11' ☺. We can modify the '7' by left padding it
with '0'. So when we sort ascending we'll have the correct representation of
of the true numeric order of the values:

'07'
'11'

How much to pad a rank, how many '0's to insert, is the string length of 
the maximum rank generated. Because the ranks in dataphor keep ascending
regardless of the cursor ordering, the maximum rank (ignoring duplicates)
is the count of rows in the table. You could even make an educated guess
based on the amount of data and use that ☺. 

Left padding the string rank (RankStr) based on a maximum length of 2 we now
have all the data to finally construct a sorting column for the ids.

var UnpackedStrings:= //This variable holds all the split data and will be used in the select query. 
ToTable(ToList(cursor(
                       (
                        (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
                          times
                            (Numbers where N<10)
                               where N<StrListCnt
                             {id,N+1 Index,ToInteger(StrList[N]) StrNum}
                        )   
                           order by {Index,StrNum}))) 
                          {Index,id,StrNum,sequence+1 RowNum};
var LengthofMaxRank:=2;
select
      (UnpackedStrings {id,Index,StrNum})
         join //Join the unique ranks to all split data. This is a natural join (on Index/StrNum).
              //Create a left padded string (RankStr) from the numeric rank.
          (UnpackedStrings group by {Index,StrNum} add{Min(RowNum) Rank})
            {id,Index,StrNum,Rank,PadLeft(ToString(Rank),LengthofMaxRank,'0') RankStr}
              order by {Index,Rank};
            
id Index StrNum   Rank RankStr 
-- ----- -------- ---- ------- 
13 1     -321     1    01     <- ranks 1-9 are left padded in the rank string (RankStr).    
12 1     -123     2    02      
11 1     -22      3    03      
10 1     -11      4    04      
9  1     -2       5    05      
8  1     -1       6    06      
2  1     7        7    07      
7  1     7        7    07      
6  1     10       9    09      
3  1     22       10   10      
4  1     22       10   10      
5  1     22       10   10      
14 1     22       10   10      
15 1     22       10   10      
1  1     100      15   15      
.  .     .        .    .
3  8     478      56   56      
4  8     478      56   56      
5  8     478      56   56      
14 8     478      56   56      
15 8     478      56   56      
5  9     9999999  61   61      
3  9     19710212 62   62      
4  9     19710212 62   62      
14 9     19710212 62   62      
15 9     19710212 62   62    

The id sort column can now be formed by concatenating, using the Concat 
operator, the RankStr within each id in the order of the rank (either Rank,
Index or RankStr). This is easy to see by ordering the above data (table) 
by id,Rank. The ascending order of Index, Rank and RankStr all reflect 
where an id lies in value (RankStr) relative to the other ids. The sort
expression will be column SortStr.

id Index StrNum   Rank RankStr 
-- ----- -------- ---- ------- 
1  1     100      15   15      
2  1     7        7    07      
2  2     4        16   16      
2  3     250      30   30      
3  1     22       10   10      
3  2     40       19   19      
3  3     5        24   24      
3  4     60       33   33      
3  5     4        38   38      
3  6     100      49   49      
3  7     300      51   51      
3  8     478      56   56      
3  9     19710212 62   62    

The complete dataphor solution

var UnpackedStrings:= //Variable that holds all split data and ranks the split strings
                      //numerically within each Index ordered by the numeric string value.
ToTable(ToList(cursor(
                       (
                        (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
                          times
                            (Numbers where N<10)
                               where N<StrListCnt
                             {id,N+1 Index,ToInteger(StrList[N]) StrNum}
                        )   
                           order by {Index,StrNum}))) 
                          {Index,id,StrNum,sequence+1 RowNum};
var LengthofMaxRank:=
  Length(Count(UnpackedStrings).ToString()); //A string length used to left pad the rank strings for
                                             //a correct string sort (will be 2 here). 
select
    IZ
     join   //Natural join of input table IZ to sort column expression by id.
         (
            //Join unique ranks to all split strings.
          (
           (UnpackedStrings {id,Index,StrNum})
             join
                 // Adjust ranks to be unique for each Index/StrNum, duplicate values should get same rank.
              (UnpackedStrings group by {Index,StrNum} add{Min(RowNum) Rank})
                 //Left pad rank string with '0' (RankStr) for sorting string correctly. Add empty
                 //string ('') to be used as a delimiter in concatenating the ranks.
               {id,Index,StrNum,Rank,PadLeft(ToString(Rank),LengthofMaxRank,'0') RankStr,'' Del}
          ) adorn {key{id,Rank}} //Sql uses physical hints, in dataphor you use logical ones.
                //Form the sorting expression SortStr to sort ids by concatenating the string ranks
                //in the order of any of the ranking columns or Index.
              group by {id} add{Concat(RankStr,Del order by {id,Rank}) SortStr}
         ) 
          order by {SortStr}; //The object of the exercise, sort the ids by SortStr to get the correct
                              //numerical order of val.

id val                                SortStr            
-- ---------------------------------- ------------------ 
13 -321                               01                 
12 -123                               02                 
11 -22                                03                 
10 -11                                04                 
9  -2                                 05                 
8  -1                                 06                 
2  7.4.250                            071630             
7  7.4.250                            071630             
6  10.30.40.50.20.30.40               09182932434650     
14 22.40.5.60.4.-100.300.478.19710212 101924333844515662 
15 22.40.5.60.4.-99.300.478.19710212  101924333845515662 
5  22.40.5.60.4.99.300.478.9999999    101924333847515661 
4  22.40.5.60.4.99.300.478.19710212   101924333847515662 
3  22.40.5.60.4.100.300.478.19710212  101924333849515662 
1  100                                15              

Solution using the sql server RANK() function with a pass thru query

The solution can be made more compact using the sql RANK() function since
dataphor doesn't have a direct equivalent ranking operation.

Since sql server can only access persisted tables (and views) and not
dataphor expressions (we can't just stick in any dataphor table expression
in a pass thru query) we'll create a persisted table to hold all the 
split data.

create table IZSqlRanks  //The table will be created in sql server.
{
 id:Integer,
 Index:Integer,
 StrNum:Integer,
 key{id,Index}
}; 

Do the splitting in dataphor (because it knows the difference between a
string and a list ☺ ) and then 'assign' the resulting table to the
persisted table IZSqlRanks. Relational databases support this kind of
assignment for all variables including tables. And all tables in dataphor
are variables. (To my sql friends this makes all the difference in the world ☺ )

IZSqlRanks:=
 (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt})
    times
    (Numbers where N<10)
      where N<StrListCnt
       {id,N+1 Index,ToInteger(StrList[N]) StrNum};
//Use a t-sql passthru query to take advantage of the sql RANK() function. The
//resulting table will be treated like any other table (expression) in dataphor.
//Left pad the string rank (RankStr) for sorting purposes. We're using a total
//string length of 2 here so single digit ranks will be padded with a leading '0'. 
select 
 IZ
  join
     (
       SQLQuery('SELECT id,[Index],StrNum,RANK()OVER(ORDER BY [Index],StrNum) AS Rank
                 FROM IZSqlRanks') 
     //Use the sql result as if it was a native dataphor expression.
        {id,Index,StrNum,Rank,PadLeft(ToString(Rank),2,'0') RankStr,'' Del} 
          adorn {key{id,Rank}} //Metadata (a key) pertaining to the table expression.
                               //This key will be efficiently used by the Concat operation.
     //Concatenate the rank strings for each id to be used as the sort order for ids.
          group by {id} add{Concat(RankStr,Del order by {id,Rank}) SortStr}
      )       
        order by {SortStr};     
        
id val                                SortStr            
-- ---------------------------------- ------------------ 
13 -321                               01                 
12 -123                               02                 
11 -22                                03                 
10 -11                                04                 
9  -2                                 05                 
8  -1                                 06                 
2  7.4.250                            071630             
7  7.4.250                            071630             
6  10.30.40.50.20.30.40               09182932434650     
14 22.40.5.60.4.-100.300.478.19710212 101924333844515662 
15 22.40.5.60.4.-99.300.478.19710212  101924333845515662 
5  22.40.5.60.4.99.300.478.9999999    101924333847515661 
4  22.40.5.60.4.99.300.478.19710212   101924333847515662 
3  22.40.5.60.4.100.300.478.19710212  101924333849515662 
1  100                                15                        

Solving the problem on sql server 2000 with the Rac utility

The Rac solution follows the same logic as the dataphor and sql methods.
The 1st Rac execute creates the Index column, from 1 to N, for every string
part (delimited integer) an id has. The 2nd Rac execute creates a rank over
the ids based on the integers (string parts) within each Index. The rank
Rac generates is equivalent to the sql DENSE_RANK() function. This rank,
like RANK(), gives duplicate values the same rank but, unlike RANK(), does
not take the duplicate ranks into account when generating the next rank.
RANK() skips ahead based on ties while DENSE_RANK() consecutively numbers
the ranks. Both types of ranks give the same correct sort result for the ids.
The 3rd Rac execute concatenates the left padded rank strings for each id
and returns the ids sorted by them, correctly :) Note that Rac is called
recursively twice.

Exec Rac 
@split='[position]',    -- Rac splits each val string from left to right based on a period ('.').
@rows='id & [position]',-- Rac keeps the position each new string part starts in.
@pvtcol='val',          -- the target column of the split operation.
@from='IZ',    
-- use a counter to generate the Index column (from 1-N) that indicates the individual string part in a val.  
@separator='.',@rowcounters='id{[Index]}',@counterdatatype='int',
@defaults1='y',@rowbreak='n',@racheck='y',@shell='n',
@select='
        SELECT 1*id AS id,[Index],CAST([1] AS INT) AS StrNum
               INTO #J1      
                 FROM rac
 Exec Rac
 @transform=~_dummy_~,
 @rows=~[Index] & StrNum & id~,
 @pvtcol=~Report Mode~,
 @from=~#J1~,
 @defaults1=~y~,@rowbreak=~n~,@racheck=~y~,@shell=~n~,
  /* use a Rac counter to rank the integer string parts within each Index (column Rank).*/
 @rowindicators=~StrNum{Rank}_overtable_~,@counterdatatype=~int~,
        @select=
          ~if object_id(~~##J2~~) is not null 
           drop table dbo.t1  
           SELECT 1*id AS id,1*StrNum AS StrNum,1*[Index] AS [Index],1*Rank AS Rank,
                  /* left pad single digit string ranks for proper character sorting.*/
                  REPLICATE(~~0~~,2-DATALENGTH(CAST(Rank AS VARCHAR(2)))) + CAST(Rank AS VARCHAR(2)) AS RankStr
                  INTO ##J2
                   FROM rac~
             /* concatenate the string ranks (RankStr) within each id into column SortStr.*/     
      Exec Rac
      @transform=~Max(RankStr) as RankStr~,
      @rows=~id~,
      @pvtcol=~Rank~,
      @from=~##J2~,
      @defaults1=~y~,@racheck=~y~,@shell=~n~,@cutpvt=~y~,
      @concatenate=~RankStr~,@separator=~~,@stringname=~SortStr~,
       /* return the ids sorted by the concatenated string ranks.*/
      @select=~SELECT IZ.id,val,SortStr  
               FROM IZ JOIN rac ON IZ.id=rac.id 
               ORDER BY SortStr
               DROP TABLE ##J2~'

id          val                                  SortStr
----------- ------------------------------------ ------------------
13          -321                                 01
12          -123                                 02
11          -22                                  03
10          -11                                  04
9           -2                                   05
8           -1                                   06
7           7.4.250                              071116
2           7.4.250                              071116
6           10.30.40.50.20.30.40                 08121517202326
14          22.40.5.60.4.-100.300.478.19710212   091314181921272830
15          22.40.5.60.4.-99.300.478.19710212    091314181922272830
5           22.40.5.60.4.99.300.478.9999999      091314181924272829
4           22.40.5.60.4.99.300.478.19710212     091314181924272830
3           22.40.5.60.4.100.300.478.19710212    091314181925272830
1           100                                  10

More on Rac @
www.rac4sql.net

Solving the problem on sql server 2005 with the Rac utility

The Rac solution in S2k5 is similar to the one on S2K (go there for more info).
But it's simpler since the split string parts (integers) can be ranked
directly using a dense rank function not available in S2k. This eliminates
a recursive call to Rac so here it's executed twice instead of three times (in S2k).

Exec Rac 
@split='[position]',   
@rows='id & [position]',
@pvtcol='val',
@from='IZ',
@separator='.',@rowcounters='id{[Index]}',@counterdatatype='int',
@defaults1='y',@rowbreak='n',@racheck='y',@shell='n',
 -- the DENSE_RANK() function, available in S2k5, is used to rank the integer string parts
 -- and is left padded.
@select='
         SELECT id,[Index],StrNum,Rank,
                REPLICATE(~0~,2-DATALENGTH(CAST(Rank AS VARCHAR(2)))) + CAST(Rank AS VARCHAR(2)) AS RankStr
                INTO #J1     
         FROM 
              (SELECT 1*id AS id,[Index],CAST([1] AS INT) AS StrNum,
                      DENSE_RANK()OVER(ORDER BY [Index],CAST([1] AS INT)) AS Rank
               FROM rac ) AS A
     Exec Rac
     @transform=~Max(RankStr) as RankStr~,
     @rows=~id~,
     @pvtcol=~Rank~,
     @from=~#J1~,
     @defaults1=~y~,@racheck=~y~,@shell=~n~,@cutpvt=~y~,
     @concatenate=~RankStr~,@separator=~~,@stringname=~SortStr~,
     @select=~SELECT IZ.id,val,SortStr  
               FROM IZ JOIN rac ON IZ.id=rac.id 
                ORDER BY SortStr~'

id          val                                  SortStr
----------- ------------------------------------ ------------------
13          -321                                 01
12          -123                                 02
11          -22                                  03
10          -11                                  04
9           -2                                   05
8           -1                                   06
7           7.4.250                              071116
2           7.4.250                              071116
6           10.30.40.50.20.30.40                 08121517202326
14          22.40.5.60.4.-100.300.478.19710212   091314181921272830
15          22.40.5.60.4.-99.300.478.19710212    091314181922272830
5           22.40.5.60.4.99.300.478.9999999      091314181924272829
4           22.40.5.60.4.99.300.478.19710212     091314181924272830
3           22.40.5.60.4.100.300.478.19710212    091314181925272830
1           100                                  10

More on Rac @
www.rac4sql.net

Thursday, January 22, 2009

Multiple cascade paths to the same table

Saturday, December 20, 2008

Extracting numbers from a string

Saturday, December 06, 2008

Sql server dynamic crosstabs by Jeff Moden

Tuesday, November 25, 2008

Linq to Sql as an opportunity to sql users

Linq to Sql should be important to sql users

Linq to Sql vs. older 4GL attempts

Linq to Sql: the 'what' and 'why' of it

Sunday, November 23, 2008

Listing Missing Table Item

Friday, November 14, 2008

S(ecure) Q(uery) L(anguage)?

Monday, September 08, 2008

Sorting a delimited string numerically

Beyond Sql

Blog Archive

Links

About Me