LiT: = lists into tables:) I'm going to talk about a bunch of kewl stuff as it relates to LiT. So get relaxed and come take a ride on the D(4) train:) As I showed here the D4 language makes it very easy to split strings. Now I want to show you some more kewl stuff about working with strings, lists and tables using the Split operator. Lets start with this string delimited by the pipe ('') character: 'FFBB HH HH D AA' Notice that blanks can appear at the beginning, end and within a string part. The Spit operator takes a string and splits it into a 'list' of strings based on one or more delimiters. Conceptually this means that the above string is transformed into: 'FF',' BB',' HH HH ',' D ',' AA' in other words a comma delimited list of strings. And with a list you can access and operate on the individual strings parts. The concept of a list is conceptually close to what concept in a database? Yep, you guessed it - a table! Each string in the list fits nicely with the idea of a row in a table. Make sense? And once the list is in a table, well we can do just about anything with it as you'll see:) So lets create a table from our example string using Split. The basic construction of Split using the method style is: 'somestring'.Split({delimiters}) With our example string, which uses the pipe as a delimiter, we have: 'FFBB HH HH D AA'.Split({''}) The Split operator uses a thingie called an 'indexer' to access the individual parts of the list, like this: 'some string'.Split({delimiters})[index] where index is a digit for the desired element of the list. So if we want the 1st string/element in the list we use: 'FFBB HH HH D AA'.Split({''})[0] We're using 0 because the indexer is 0 based, ie. 1st element 0, 2nd element 1 etc. Real easy. Ok we've got enough info now to select the 1st string: select 'FFBB HH HH D AA'.Split({''})[0] FF Ok for the moment lets leave the idea of getting individual parts of the list with the indexer and think of getting 'the whole list' with one swipe of the magical D4 wand. And that wand is the D4 ToTable. Since Split has done the heavy lifting of creating the list, all we have to do to get LiT is this: select ToTable( 'FFBB HH HH D AA'.Split({''})) Value ------- D HH HH AA BB FF and we are Lit (list in table) :) When you transform a list to a table the column by default is 'Value'. You can name the column anything you want by including a column name in Split. I'll call the column 'STRING': select ToTable( 'FF GG HH HH D AA'.Split({''}),'STRING') STRING ------- D HH HH AA BB FF Ok we got the list in a table but you see that the strings have the same blanks as they had originally. Since we are LiT we can trim the leading and trailing blanks of each row (string) using : {Trim(STRING) STRING} This means trim each row and retain the same column name. select ToTable ( 'FFBB HH HH D AA'.Split({''}),'STRING' ) {Trim(STRING) STRING} STRING ------ AA BB D FF HH HH Ok at this point your head is probably ready to explode since you can see that while I've made all this fuss about LiT, the rows of the table does not match what the order of strings should be! In other words, if Split creates a list shouldn't the table reflect the strings like: STRING ------ FF BB HH HH D AA If this has occurred you I'm doing my job:) As I said above we can use what's called an indexer to address the elements of the list in order. And we'll do that soon. But, it's meaningful to get LiT even though the elements are not in the proper sequence. Just understand that without supplying specific information to Split about the individual elements (the indexer thingie []) we can only get a table whose rows reflect the ordering of the string elements themselves. Here we add a system generated auto-incrementing column INDEX using the GetNextGenerator operator that reflects the ascending order of STRING. (The INDEX column is similar to the identity function used in MS Sql Server). You can see that INDEX reflects the original (untrimmed) strings. select (ToTable ( 'FFBB HH HH D AA'.Split({''}),'STRING' )) add {GetNextGenerator('INDEX') INDEX,Trim(STRING) STRING1} STRING INDEX STRING1 ------- ----- ------- D 1 D HH HH 2 HH HH AA 3 AA BB 4 BB FF 5 FF Here we get the trimmed strings and then add the auto-incrementing column. select ( (ToTable ( 'FFBB HH HH D AA'.Split({''}),'STRING' )) {Trim(STRING) STRING} ) add { GetNextGenerator('INDEX') INDEX } STRING INDEX ------ ----- AA 1 BB 2 D 3 FF 4 HH HH 5 The INDEX now reflects the ascending order of the trimmed strings. No matter how you cut it, splitting a string in toto into a table is not going to give you the order as they appear in the original string. As I said even though this is true it still has value and significance. Now I'm going to show you why. Here are 2 tables created on MS Sql Server 2005. We're going to access them with Dataphor using D4. create table str1 (ID1 int primary key,STRING1 varchar(50)) insert str1 values(1,'AABB CC') insert str1 values(2,'FFGGHH MM J KK') insert str1 values(3,'SSTTUUVV') insert str1 values(4,'AAYYYYPQQRDD') insert str1 values(5,'D10B20A30') create table str2 (ID2 int primary key,STRING2 varchar(50)) insert str2 values(1,'AA BB CC') insert str2 values(2,'FF GG HH MM J KK') insert str2 values(3,'SSTTUUXX') insert str2 values(4,'AAPPPPWRQRDD') insert str2 values(5,'D10B20') Using D4: select str1 ID1 STRING1 --- ------------------ 1 AABB CC 2 FFGGHH MM J KK 3 SSTTUUVV 4 AAYYYYPQQRDD 5 D10B20A30 select str2 ID2 STRING2 --- ---------------------- 1 AA BB CC 2 FF GG HH MM J KK 3 SSTTUUXX 4 AAPPPPWRQRDD 5 D10B20 Lets suppose we want a query that compares by ID STRING1 and STRING2 as lists. We want the ID's where the lists are the same. In other words, we want the ID's where all elements in STRING1 and STRING2 are the same (equal). So we're looking for ID's 1 and 2 where the trimmed elements are the same. You'll recall that we LiT a string using: select ToTable ( 'FFBB HH HH D AA'.Split({''}),'STRING' ) {Trim(STRING) STRING} All we have to do to LiT our strings is use STRING1 and STRING2 in Split. And once we have LiT both strings we can wave a magic wand. We can directly test to see if the 2 tables are equal! We simply join by ID's and then test the tables for equality in where. select str1 join str2 by ID1=ID2 where ((ToTable(STRING1.Split({''}),'STRING')) {Trim(STRING) STRING}) = ((ToTable(STRING2.Split({''}),'STRING')) {Trim(STRING) STRING}) ID1 STRING1 ID2 STRING2 --- ------------------ --- ---------------------- 1 AABB CC 1 AA BB CC 2 FFGGHH MM J KK 2 FF GG HH MM J KK Comparing tables is just a logical extension of comparing integers or strings or just about anything else. Tables are equal if they have the same column(s) (STRING) and the elements (rows) are equal. Is that a magic wand or what:) Now it's fundamentally important to understand how this works. The key of each table is of course STRING. Each key also implies an order. So each row, ordered by the key (STRING), in each table is compared to the corresponding row in the other table just like you'd compare integers. If every row is equal in the 2 tables the comparison is true, otherwise it's false. D4 is thus able to logically address the rows in a table via the key. This is fundamentally different than sql which can't do this. The ability to use keys for logical addressing has big, big implications. It's one of the fundamental things that differentiates a relational language from sql! I'll get into it in future posts. Now back to our example. Note that for ID's 5 that have 3 and 2 rows respectively, the table comparison was no problem. I figure that if your still with me your interested and pretty sharp. So I'll give you the benefit of the doubt and assume a pretty important question has occured to you. What if the original string has duplicate elements? For example: 'AAATTCCAAA' Lets see what happens when we try to LiT this: select ToTable ( 'AAATTCCAAA'.Split({''}),'STRING' ) WoW! We get this from the Dataphor compiler: 'Internal Index Error: "Duplicate key violation." ' Yeah we're SOL but does it makes sense? Well all tables in D4 have to have a key. No if, ands or buts. The key is of course STRING and when the string is split into a table there are duplicates of 'AAA' and there can't be any duplicate rows. So there goes our table:( I'm sure you can guess the remedy:) But remember that D4 is a strongly typed language. A list is not the same as a table. We've been trying to convert a list to a table. Too different types, that is the point of having to convert. And sometimes it just won't work the way we want it to. Before we remedy this, things could get more dicey if we have this: select ToTable ( 'AAA TTCC AAA '.Split({''}),'STRING' ) STRING ------ AAA AAA CC TT And then trimmed the strings: select ToTable ( 'AAA TTCC AAA '.Split({''}),'STRING') {Trim(STRING) STRING} STRING ------ AAA CC TT An element 'AAA' was eliminated (no duplicates allowed) so we end up with 3 elements instead of 4. And if we compared this with 'AAATTCC' select TableDee add {'False Positive' Compare} where (ToTable ( 'AAA TTCC AAA '.Split({''}),'STRING') {Trim(STRING) STRING} ) = (ToTable ( 'AAATTCC'.Split({''}),'STRING') {Trim(STRING) STRING} ) Compare -------------- False Positive we'd get a false positive. The tables are equal but the lists are not! Phew:) (Don't worry about the 'TableDee' thingie, I'll get to that in due time:) But please come to the idea that table comparisons are a magic wand and once you start using them you'll wonder how you ever got along without them. But alas, you can loose some of that magic when working with lists so be careful:) (More on table comparisons here). So finally lets get to splitting the list by the individual elements, ie. using an indexer ([]). On the MS Server newsgroups it's common to see the phrase 'use a set oriented approach'. This means don't use procedural/imperative code. So no cursors, or loops:) Now I've already gone over splitting strings here. But I want to show you in a little more depth what is going on. Given a string: 'AAA TTCC AAA ' A basic query to get the elements of the list is: select ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX') where 'AAA TTCC AAA '.Split({''}).Count()>INDEX {INDEX+1 INDEX,'AAA TTCC AAA '.Split({''})[INDEX].Trim() STRING} INDEX STRING ----- ------ 1 AAA 2 TT 3 CC 4 AAA Note that with the inclusion of INDEX we are guaranteed not to eliminate duplicate STRINGs. The INDEX column is always unique and takes part in the key of the table. Now the statement: ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX') creates a table with column INDEX for the list of integers. 'AAA TTCC AAA '.Split({''}).Count()>INDEX filters on INDEX where the count of the elements in the list is greater than the INDEX value. There are 4 elements in the list so INDEXs 0, 1, 2, 3 satisfy this criteria. Remember lists are 0 based. So we end up with a table with 4 rows, ie: INDEX ----- 0 1 2 3 Then the statement: {INDEX+1 INDEX,'AAA TTCC AAA '.Split({''})[INDEX].Trim() STRING} is applied. This statement modifies the above table by redefing INDEX and adding the column STRING. The STRING column is each list element, in order, identified by the indexer ([INDEX]). Each element of the string is also trimmed. We added 1 to INDEX because.. well I wanted too because I like to start from 1 not 0:) So there you have it. Well...not quite. There's stuff going on here under the covers that you should know about and understand. How is it that Dataphor interprets: {0, 1, 2, 3, 4, 5, 6, 7, 8 } as a list of integers and not something else? To answer this let me show you formally how this table of integers is created in a block of D4 code. begin var Indexes : list(Integer); Indexes := { 0, 1, 2, 3, 4 , 5, 6, 7, 8 }; create session table Indexes from ToTable(Indexes,'INDEX'); end; A variable, Indexes, is assigned the type list(Integer), ie. an infinite list where the elements are of type integer. Once variable Indexes is defined as a type of list, the actual elements of the list (the integers) are assigned to it. Now the list can be converted into a table. The create session table Indexes from creates what is called a session table. It's like a permanent table but is only available to the Dataphor session that creates it. It's like an MS Sql Server ## temp table. select Indexes INDEX ----- 0 1 2 . 8 The answer to the question of how Dataphor knows that: ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX') is a list of integers is because the system has inferred it! That's why it isn't necessary to formally declare any variable with a type of list(Integer). In other words, a comma separated bunch of integers is logically a list of integers. The D4 compiler as an inference engine is a super big deal!:) The concept of logically inferring metadata (ie.keys, constraints) does not exist in an sql database. I'll get into this big deal in future posts. Ok, back to our example:) So now you know some more about the integer list in our query: select ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX') where 'AAA TTCC AAA '.Split({''}).Count()>INDEX {INDEX+1 INDEX,'AAA TTCC AAA '.Split({''})[INDEX].Trim() STRING} And what about the argument to Split? Well here is a formal code block for splitting a string (using the above Indexes table). begin var LList : list(String); var ExString : String; ExString := 'AAA TTCC AAA '; LList := ExString.Split( {''} ); create session table ExList from Indexes where LList.Count()>INDEX {INDEX+1 INDEX, LList[INDEX].Trim() STRING} ; end; select ExList INDEX STRING ----- ------ 1 AAA 2 TT 3 CC 4 AAA We don't have to explicitly declare that Split should result in a list of String elements. All this information is inferred by the compiler (thank you very much D4:) I'm rambled enough:) Now we're going to finally, finally split strings and generate a little report. For good measure we're going to insert another row in our string tables, str1 and str2, via D4: insert table { row { 6 ID1, 'AAA TTCC AAA ' STRING1 } } into str1 insert table { row { 6 ID2, 'TTCCAAA' STRING2 } } into str2 So now have: select str1 ID1 STRING1 --- ------------------ 1 AABB CC 2 FFGGHH MM J KK 3 SSTTUUVV 4 AAYYYYPQQRDD 5 D10B20A30 6 AAA TTCC AAA select str2 ID2 STRING2 --- ---------------------- 1 AA BB CC 2 FF GG HH MM J KK 3 SSTTUUXX 4 AAPPPPWRQRDD 5 D10B20 6 TTCCAAA We want to compare strings by ID and only where the lists have the same number/count of elements. And we only want the strings where there are differences for the same element (INDEX). So each row will include the ID, the corresponding different STRINGs, the INDEX for each different element and the individual different string parts. select ( ( ( (str1 join str2 by ID1=ID2) where STRING1.Split({''}).Count()=STRING2.Split({''}).Count() ) times // times is like the sql cross join. Indexes where STRING1.Split({''}).Count()>INDEX ) where // <> is the notation for not equal. STRING1.Split({''})[INDEX].Trim()<>STRING2.Split({''})[INDEX].Trim() ) { ID1 ID,STRING1,STRING2,INDEX+1 INDEX, STRING1.Split({''})[INDEX].Trim() DIF1, STRING2.Split({''})[INDEX].Trim() DIF2 } ID STRING1 STRING2 INDEX DIF1 DIF2 -- ---------------- ---------------- ----- ---- ---- 3 SSTTUUVV SSTTUUXX 4 VV XX 4 AAYYYYPQQRDD AAPPPPWRQRDD 2 YYYY PPPP 4 AAYYYYPQQRDD AAPPPPWRQRDD 3 PQ WR Coming from sql you should think of each set of matching parenthesis (indented) as a derived table. Unlike sql, a derived table in D4 does not start with a 'select' statement. In D4 it's just the opposite of sql. A table is derived and then you specify the columns you want. Which way makes more sense? In sql you pay for the meal before you even order it:) Also note in D4 you don't have to give the derived able a name (alias). In D4 things are much cleaner and more logical. I hope you found this interesting. And there's plenty more to come:)
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Friday, September 01, 2006
Dataphor - Get LiT!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment