Greetings, To paraphrase 'V' from the movie 'V for Vendetta' if as an application developer or even as an sql programmer you feel that everything is fine and dandy in S2005 and even .net you need read no further. But if you feel there is something askew, something that does not make alot of sense in sql land, then I suggest you read on. 'Think in terms of sets' in sql is empty of any real meaning except don't use declarative statements. This phrase should be replace with 'think in terms of application development'. As a critic of sql I can only go so far. So from time to time I'm going to walk the walk instead of just talk the talk. I'm going to show you how to go about application development in a new way. A way that makes sense. I'm going to introduce you to Dataphor from: www.alphora.com In broad strokes Dataphor is an application development system based on the MS .net technologies. It is therefore a strongly typed system. It offers a table type of a very specific nature, a relational table type. To support this type there are a host of 'relational' operators. The table type and its corresponding operators are the advancement over and beyond current sql. All the types, be they table, strings, numbers, boolean can be accessed and manipulated within a Pascal like declarative language. And within the language all types can logically be treated as 'variables'. This development environment is referred to as 'D4'. The availability of a 'relational' system is a fairly recent thing. What transpired in the interim was the perpetuation of a grand myth. Developers were led to believe that sql was relational! Led by vendoraptors like IBM and Oracle and more recently MS and industry punitwits like Joe Celko, the illusion of sql working as if it were relational was and still is the industry norm. Finally with Dataphor developers can 'see' the differences for themselves. Of course there are many myths surrounding sql and I hope to get to most of them in time -:) So lets look at question raised in the post: Tuesday, May 22, 2007 8:02 AM microsoft.public.sqlserver.programming 'parsing string into sequence of numbers and letters?' http://tinyurl.com/yo28he This question elucidates many themes still prevalent today. 1. The Abbott & Costello syndrome. Where is the appropriate place for a data operation, the client or the server. Every failure for sql to support an operation has been portrayed as the inappropriateness for that operation to done on the server. A glaring example of this is list/string operations. A favorite tactic is too call an operation 'formatting' and argue that it belongs on the client. This none-sense has gone on forever and disappears in the strong typed environment of D4. The artifical dichotomy between client/server is a major hindrance to application development. It is now possible for developers to see this for themselves. 2. There is no logical manifestation of the concept of a 'key' in sql. Like so many concepts sql convinced users that a key was synonymous with an 'index'. Keys are of such fundamental importance that it is actually problematic to even call S2005 (or Oracle,DB2) a database! With D4 developers can see all the 'logical' advantages they gain with their use. All the none-sense about 'faking' arrays in sql goes out the window in D4. As you will see there is no reason to fake something and create more of a mess when the concept of an 'array' is seen as a component of a key. It is often written that sql is a 'data retrieval language'. It is more meaningful to say it is a 'fast' retrieval language. That is why there is so much emphasis on indexes. But performance comes at a cost such as integrity and manipulation. Up till now this cost has been covered up with all kinds of none-sense. 3. D4 offers consistency of operations across types. Such consistency bring clarity to application development no found in sql. What we want to do here is take a string and break it apart creating separate columns for letters and digits. The only twist is that there is an explicit ordering - letters for the 1st column, digits for the 2nd and letters for the 3rd column. If a string does not have letters-digits-letters we want to fill the remaining column(s) with a placeholder (we're using a blank here but a null (nil in D4) or anything else could be used). Table VMI will reside in the pubs database in S2005. RowID will be the clustered primary key. Additional indexes can be created directly in S2005 or thru D4. All logical operations concerning 'keys' will be handled by the Dataphor server. Essentially all data operations are handled by D4. This includes all data integrity and manipulation. And both are a qualitative advance over sql. S2005 is simply the data respository and D4 will take advantage of its (hopefully) fast data access paths. The D4 server 'chunks' data operations. What this means is D4 will determine when it is appropriate to hand over query requests (as well as intermediate steps) to S2005 and when to handle them itself. This whole process is basically transparent to the user. The physical implementation details needn't concern the user. create table VMI { RowID:Integer, InputStr:String static tags {Storage.Length = '12'}, key{RowID} }; insert table { row{1 RowID,'AA99BB' InputStr}, row{2,'WWW8888PPPPP'}, row{3,'77DD'}, row{4,'EE66'}, row{5,'G'}, row{6,'HHHH5J'}, row{7,'44'}, row{8,'22KK'} } into VMI; //The data is now in S2005 and available to D4. //select VMI; Operator Op1VMI() uses string operators to chunk together like characters within a RowID. As an operator it is available to be used anywhere. An operator can return any kind of supported type in Dataphor. That means it can return: nothing, a table, a row (multiple scalar values), a scalar, a cursor, a list. The result may be virtual or it may persist in a table. Whatever it can return, it can also use as arguments (parameters). Since 'relational' is synonymous with 'type', tables/rows can be arguments since we are dealing with structure based on type, ie for a table its type is the structure of the table the column names and their scalar types. This a fundamental difference from sql which equates a table with its file name. This is also why in Dataphor we are dealing with tables as relational variables. A table is just another type of variable, but a variable nonetheless just like an integer. Also note that there is no arbitrary breakup of operators. There are no 'procedures' as opposed to 'functions' as in sql server. There are no limitations as what can be done in an operator as there are in sql server. All the sql limitations are based on design flaws that developers have to work around. This is another example of the general pattern of 'inconsistency' in sql:). If your wondering why not define a view instead of an operator I can understand. Views in Dataphor are not the same relatively simpleton macros they are in sql. They are much more highly evolved in a relational system and serve many different functions not available in sql. There is no reason for a 'relational' view here, again all in good time -:). create operator Op1VMI(): table{RowID:Integer,StrChar:String,InputStr:String,Rank:Integer, Cnt:Integer,PadStr:String,sequence:Integer} begin result:=table of typeof(result){}; result:= ToTable ( ToList ( cursor ( ( //Each character in a string can be addressed by its indexed position. (VMI add {Length(InputStr) LenStr}) times numbers where num<=(LenStr-1) add {InputStr[num] StrChar} remove {LenStr} ) //Here we're concatenating repetitions of characters into a single string. //We could have simply used the Concat operator but there is a rich //number of string operators available in D4 (and you can add your own). group by {RowID,StrChar} add{Max(InputStr)InputStr,Min(num)Rank,Count()Cnt} add {PadRight(StrChar,Cnt,StrChar) PadStr} order by {RowID,Rank} //Using ToTable(ToList(cursor creates a table with a //sequence (rank) column in the order of the cursor declaration. //The sequence column is similar the sql-99 row_number() function. //In fact could have used a passthru query to sql server and used //row_number(). The returned table would then be available for //manipulation within Dataphor. All in good time -:). )//cursor )//ToList );//ToTable end; Invoking the operator makes its result (a table) available, just like a table. For examples: select Op1VMI() order by {sequence}; Table LVMI is similar in concept to a #tmp table in S2005. We're going to store both the delimited string (DelStr) and the delimited string as a 'list' (StrAsList). You need only a dose of common sense to understand the difference between the 'string' TYPE and the 'list' TYPE. In D4 a type is supported by a plethora of dedicated operators. This includes operations for strings and lists not available in sql. These operators also serve to guarantee the integrity of the type. The vacuum left by their absence in sql gives raise to a special type of programming, commonly referred to as a 'kludge'-:) (Wasn't LV1 the planet in Alien/Aliens?) create session table LVMI { RowID:Integer, InputStr:String, DelStr:String, //There is no real need to store the delimited //string. In fact it defeats the purpose of the //'list'. But we do it to illustrate the difference //between a delimited string and a list. StrAsList:list(String), key{RowID} } ; In sql there is always much adieu made of violating normal form by storing delimited strings. The whole argument disappears with a 'list' type. As I stated tables and table expressions are just variables. We can therefore define a batch using variables just as if we were crunching numbers. As you learn the 'relational' query language constructs in Dataphor one of the first new things you will learn is that 'select' in sql is a fundamental part of a query expression. And that is just non-sense. This is just how sql was designed 40 years ago. I was as ridiculous then as it is now. There is no reason for this and in the relational world it is eviscerated from expressions and used only where it makes sense, as part of a 'statement' to retrieve data. An operator that returns a table is treated exactly as a table is. It is just common sense! And there is no silly 'select' to muddle things up! *** Start of batch using variables to represent table expressions *** var B1:= //This code creates the column PRank which is like the sql row_number() //partitioned by RowID. Subtracting OffSet from sequence within each //RowID will produce a rank from 1 to N within each RowID. The join //is what is known as a 'natural' join. The D4 server simply joins //on the same column names in the two table expressions (RowID). Of //course D4 allows the join declared using 'by' columns (like 'on' in S2005). Op1VMI() join ( Op1VMI() group by {RowID} add{Min(sequence) OffSet} redefine { OffSet:= OffSet-1 } ) add{sequence-OffSet PRank} ; /* Again B1 is a variable that holds a table based on Op1VMI() to which we have added a new column PRank. */ //select B1 order by {sequence}; Uncomment to see B1 table. /* 'Add' is how constants/expressions are included in table expressions. D4 is a relational algebra and if you remember your high school class you know, atleast tacitly, how it works. You simply group terms together and 'add' new terms, glue them together and add still more terms. On the other hand, no one really knows how sql works. It's not an algebra nor a calculus. It's a 'that just the way it is' language. It is essentially faith based. Given enough time and permutations of constructs one gains faith that somehow a mess will produce a result -:) Forty years ago IBM produced this bastard child. Larry Ellison piled more stupidity on it and made a fortune. Not even the history of sql makes any sense. */ var B2:= ( B1 join ( B1 add { (((PRank mod 2) =1 ) and (IsLetter(PadStr)=true)) ColDef1} /* There are those that argue that the boolean type has no place in a relational language. Of course these same people are quick to point out the concept of 'relation division'. This is another form of non-sense. The group by Any is an 'iterated or' over rows of a table. It returns true if any ColDef1 for a row is true and false otherwise. It is being used here to change (shift) the PRank column depending on whether the string is digit/letter. */ group by {RowID} add{Any(ColDef1) TestCols} ) redefine{PRank:=PRank+ToInteger(not TestCols)} ) adorn {key{RowID,PRank}} /* Adorn is a way for the user to communicate with D4 on a logical level. Here we are telling D4 to create a specific key in addition to the ones it infers. In sql you have resultsets that are essentially like spread sheets. They have no keys as sql is concerned with performance and therefore indexes. In D4 there are only tables regardless of how they are derived. And tables to be tables must have key(s). The reason sql has a 'distinct' construct is to cover up this embarrasement. There is no 'distinct' in D4 and the system guarantees that each row in a table is unique by inferring key(s). The idea in sql circles that data and meta-data don't mix is non-sense. */ //This outer join will ensure all RowIDs have 3 PRanks. right join ( (VMI over{RowID}) //This results in only unique values of RowID. times //'times' is like sql cross join. //Valid expressions within a query include declaring a table. //Here we are filling in non-existent PRanks for each RowID //so each RowID has 3 strings to concatenate. table{row{1 PRank},row{2},row{3}} ) with {IgnoreUnsupported = 'true'} {RowID,PRank,InputStr,IfNil(PadStr,' ') PadStr}; //select B2 order by{RowID,PRank}; Uncomment to see B2 table. /* var B3 is where we finally form a delimited (comma) string (DelStr). The Concat operator is an aggregate that can be used within a group by. Concat has an order by that is used to form the string within RowID in a specific order. We then insert table B3 into table LVMI. Since a 'list' is a totally different animal (type) than a 'string' we have to form a list from DelStr before we can insert it (as a list) into LVMI (StrAsList). A literal list of type string is represented as: {'A','B','C'} To convert the delimited string DelStr to a list type we use the Split operator, ie. DelStr.Split() StrAsList This says create a list by splitting DelStr (comma is the default delimiter) and name the LIST 'StrAsList'. Simple yes?-:) Note that unlike sql there is no required ordering of names in an insert statement to match the table name ordering. In D4 only the column names of the inserted table need be found in the table their being inserted into. */ var B3:= ( B2 add{',' Del} adorn {key{RowID,PRank}} ) group by {RowID} add {Concat(PadStr,Del order by {RowID,PRank}) DelStr,Max(InputStr) InputStr} redefine{DelStr:=Replace(DelStr,' ','')}; //select B3; Uncomment to see table B3. (See below to see the table.) insert (B3 {InputStr,RowID,DelStr,DelStr.Split() StrAsList}) into LVMI; //*** End of batch *** Each element of a list is accessible by an index just as each element of a string is. Did someone say 'array'? -:) There is no need to simulate arrays in D4. Types such a tables,strings,lists and cursors imply an array structure and it is only common sense that there is a simple logical addressing method for each type. Lists and strings are addressed using a 0 based index. Below we return the three items of the list 'StrAsList' for each row. If you've made it this far you may have figured out that in D4 returned columns are specified and can be manipulated 'after' specifying the table name. Of course this is the opposite of sql. Which makes more sense? -:) select LVMI {RowID,DelStr,StrAsList[0] Col1,StrAsList[1] Col2,StrAsList[2] Col3}; RowID DelStr Col1 Col2 Col3 ----- -------------- ---- ---- ----- 1 AA,99,BB AA 99 BB 2 WWW,8888,PPPPP WWW 8888 PPPPP 3 ,77,DD 77 DD 4 EE,66, EE 66 5 G,, G 6 HHHH,5,J HHHH 5 J 7 ,44, 44 8 ,22,KK 22 KK Just as you can address the items of a list, you can directly address a row of table via a key. Using RowID as the key the first row and the elements of the list can be produced simply by: select LVMI[1] {RowID,StrAsList[0] Col1,StrAsList[1] Col2,StrAsList[2] Col3}; RowID Col1 Col2 Col3 ----- ---- ---- ---- 1 AA 99 BB And there you have it. Bye for now, Steve More intermediate results to see what's happening -:) Operator (table) Op1VMI(). select Op1VMI() order by {sequence}; RowID StrChar InputStr Rank Cnt PadStr sequence ----- ------- ------------ ---- --- ------ -------- 1 A AA99BB 0 2 AA 0 1 9 AA99BB 2 2 99 1 1 B AA99BB 4 2 BB 2 . 3 7 77DD 0 2 77 6 3 D 77DD 2 2 DD 7 Table B1. select B1 order by {sequence}; PRank (for partitioned rank) repeates from 1 to N within each RowID. RowID StrChar InputStr Rank Cnt PadStr sequence OffSet PRank ----- ------- ------------ ---- --- ------ -------- ------ ----- 1 A AA99BB 0 2 AA 0 -1 1 1 9 AA99BB 2 2 99 1 -1 2 1 B AA99BB 4 2 BB 2 -1 3 . 3 7 77DD 0 2 77 6 5 1 3 D 77DD 2 2 DD 7 5 2 4 E EE66 0 2 EE 8 7 1 4 6 EE66 2 2 66 9 7 2 5 G G 0 1 G 10 9 1 . 7 4 44 0 2 44 14 13 1 8 2 22KK 0 2 22 15 14 1 8 K 22KK 2 2 KK 16 14 2 Looking at immediate result using ColDef1 in B2. Only when ColDef1 evaluates to false for all rows within a RowID is it necessary to shift up RRank by 1 for each item. Of course this is only when the 1st characters in the string are digits (RowIDs 3,7 and 8). But we give a general solution. RowID StrChar InputStr Rank Cnt PadStr sequence OffSet PRank ColDef1 ----- ------- ------------ ---- --- ------ -------- ------ ----- ------- 1 A AA99BB 0 2 AA 0 -1 1 True 1 9 AA99BB 2 2 99 1 -1 2 False 1 B AA99BB 4 2 BB 2 -1 3 True . 3 7 77DD 0 2 77 6 5 1 False 3 D 77DD 2 2 DD 7 5 2 False 4 E EE66 0 2 EE 8 7 1 True 4 6 EE66 2 2 66 9 7 2 False 5 G G 0 1 G 10 9 1 True 6 H HHHH5J 0 4 HHHH 11 10 1 True 6 5 HHHH5J 4 1 5 12 10 2 False 6 J HHHH5J 5 1 J 13 10 3 True 7 4 44 0 2 44 14 13 1 False 8 2 22KK 0 2 22 15 14 1 False 8 K 22KK 2 2 KK 16 14 2 False Intermediate results for B2 looking at correction of PRank for all RowIDs where TestCols is false (meaning all ColDef1 are false within the RowID. The PRanks within RowIDs 3,7,8 are incremented by 1 (see above). RowID StrChar InputStr Rank Cnt PadStr sequence OffSet TestCols PRank ----- ------- ------------ ---- --- ------ -------- ------ -------- ----- 1 A AA99BB 0 2 AA 0 -1 True 1 1 9 AA99BB 2 2 99 1 -1 True 2 1 B AA99BB 4 2 BB 2 -1 True 3 . 3 7 77DD 0 2 77 6 5 False 2 3 D 77DD 2 2 DD 7 5 False 3 4 E EE66 0 2 EE 8 7 True 1 4 6 EE66 2 2 66 9 7 True 2 5 G G 0 1 G 10 9 True 1 6 H HHHH5J 0 4 HHHH 11 10 True 1 6 5 HHHH5J 4 1 5 12 10 True 2 6 J HHHH5J 5 1 J 13 10 True 3 7 4 44 0 2 44 14 13 False 2 8 2 22KK 0 2 22 15 14 False 2 8 K 22KK 2 2 KK 16 14 False 3 Table B2. Each RowID has 3 strings (PadStr) with the correct PRank. For missing PRanks a blank is assigned to PadStr. select B2 order by{RowID,PRank}; RowID PRank InputStr PadStr ----- ----- ------------ ------ 1 1 AA99BB AA 1 2 AA99BB 99 1 3 AA99BB BB . 3 1 <No Value> 3 2 77DD 77 3 3 77DD DD 4 1 EE66 EE 4 2 EE66 66 4 3 <No Value> 5 1 G G 5 2 <No Value> 5 3 <No Value> . 7 1 <No Value> 7 2 44 44 7 3 <No Value> 8 1 <No Value> 8 2 22KK 22 8 3 22KK KK Table B3. This is where PadStr from B2 is concatenated to form a delimited string. Each DelStr is guaranteed to have 3 elements. RowID InputStr DelStr ----- ------------ -------------- 1 AA99BB AA,99,BB 2 WWW8888PPPPP WWW,8888,PPPPP 3 77DD ,77,DD 4 EE66 EE,66, 5 G G,, 6 HHHH5J HHHH,5,J 7 44 ,44, 8 22KK ,22,KK
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Tuesday, May 29, 2007
Sql Mythology
Subscribe to:
Post Comments (Atom)
3 comments:
Considering the following DDL and your sample data:
CREATE TABLE VMI (
RowID int PRIMARY KEY,
InputStr varchar(12) NOT NULL
)
INSERT INTO VMI VALUES (1,'AA99BB')
INSERT INTO VMI VALUES (2,'WWW8888PPPPP')
INSERT INTO VMI VALUES (3,'77DD')
INSERT INTO VMI VALUES (4,'EE66')
INSERT INTO VMI VALUES (5,'G')
INSERT INTO VMI VALUES (6,'HHHH5J')
INSERT INTO VMI VALUES (7,'44')
INSERT INTO VMI VALUES (8,'22KK')
We need an auxiliary Numbers table:
SELECT IDENTITY(int) N INTO Numbers FROM syscolumns
ALTER TABLE Numbers ADD PRIMARY KEY (N)
And then we can provide the expected result in a single SQL query:
SELECT V.InputStr,
COALESCE(SUBSTRING(V.InputStr,1,A.N),'') AS C1,
COALESCE(SUBSTRING(V.InputStr,COALESCE(A.N,0)+1,
COALESCE(B.N,100)-COALESCE(A.N,0)),'') AS C2,
COALESCE(SUBSTRING(V.InputStr,B.N+1,100),'') AS C3
FROM VMI V LEFT JOIN (
SELECT InputStr, N FROM VMI, Numbers
WHERE SUBSTRING(InputStr,N,2) LIKE '[A-Z][0-9]'
) A ON V.InputStr=A.InputStr
LEFT JOIN (
SELECT InputStr, N FROM VMI, Numbers
WHERE SUBSTRING(InputStr,N,2) LIKE '[0-9][A-Z]'
) B ON V.InputStr=B.InputStr
InputStr C1 C2 C3
------------ ------------ ------------ ------------
AA99BB AA 99 BB
WWW8888PPPPP WWW 8888 PPPPP
77DD 77 DD
EE66 EE 66
G G
HHHH5J HHHH 5 J
44 44
22KK 22 KK
(8 row(s) affected)
Razvan
Two comments: 1) It looks like the proposed alternate solution in Oracle by razvan socol does not take into account the requirement that the first column always consist of letters, the second column digits, and the third column letters again. 2) It should be noted that if this is a correct alternate expression, then an analogous D4 expression could be provided.
Regards,
Bryn
Hello Bryn,
You are correct about the sql solution. RowID 5 incorrectly places 'G' into the 2nd (number) column instead of the first. But more importantly, I'm trying to show D4 in broad terms as opposed to 'slick' solutions. Of course there's always the possiblity that I'm being too broad -:)
best,
steve
Post a Comment