This example is based on the question asked in the thread: microsoft.public.sqlserver.programming Dec 18, 2008 Pulling a number out of an nvarchar field http://tinyurl.com/3quyap The OP was interested in pulling out the 1st occurrence of a number in a string. The string has numbers and letters. So if the string is 'XY40A3' we want the number 40 which is the 1st of two numbers in the string. This is very easy to do in Dataphor. Just as a table of numbers comes in handy for solving many different problems so does a list of letters. The idea here is to treat each letter in the string as a 'delimiter'. We then split the string using the list of letters as the delimiters so what results are the number(s) in the string. We can store the list (the letters of the alphabet) in a temporary table for easy access. create session table AlphaTable { AlphaList:list(String), key{ } }; AlphaTable:= table { row { {'A','B','C','D','E','F','G','H','I','J','K','L','M','N', 'O','P','Q','R','S','T','U','V','W','X','Y','Z'} AlphaList} }; For example if we split a string, transform it to a table and remove any blanks we'll just have numbers left. If we order by sequence (Index) it will show the numbers as they occur from left to right. select ToTable('XY40A3'.Split(AlphaTable[].AlphaList),'StrNum','Index' ) where StrNum>' ' order by {Index}; StrNum Index ------ ----- 40 2 3 3 Not only would it be easy to get the 1st number but we can get the occurrence of any number easily. The 1st occurrence is just a special case of the general problem of getting the Nth occurrence in a string. By using ToTable(ToList(cursor by Index (that follows the order of numbers from left to right in the string) we can create a consecutive rank from 1 to N over the table of numbers that will allow direct access to the Nth number (if it exists). select ToTable ( ToList ( cursor ( ( ToTable('XY40A3RAD853'.Split(AlphaTable[].AlphaList),'StrNum','Index' ) where StrNum>' ' {ToInteger(StrNum) Num,Index} ) order by {Index} ) ) ) {Num,Index,sequence+1 NthIndex} ; Num Index NthIndex --- ----- -------- 3 3 2 40 2 1 853 6 3 Here is the an operator for the Nth occurrence that takes into account lower case letters and returns a -1 if the Nth occurrence doesn't exist. create session operator NthNum(AStr:String,N:Integer):Integer begin var T1:= ToTable ( ToList ( cursor ( ( ToTable(Upper(AStr).Split(AlphaTable[].AlphaList),'StrNum','Index' ) where StrNum>' ' {ToInteger(StrNum) Num,Index} ) order by {Index} ) ) ) {Num,Index,sequence+1 NthIndex}; result:=IfNil((T1 adorn{key{NthIndex}})[N].Num,-1); end; select NthNum('SF346fs47sGs759 ',1); //returns 346 select NthNum('SF346fs47sGs759 ',2); //returns 37 select NthNum('SF346fs47sGs759 ',3); //returns 759 select NthNum('SF346fs47sGs759 ',4); //returns -1 Here a table of strings in stored in an Sql Server database from which we can extract the 1st occurrence of a number. create table FooStrings { keycol:Integer, datacol:String nil, key{keycol} }; FooStrings:= table { row{1 keycol, 'XYZ40AB' datacol}, row{2, 'WX32A'}, row{3, '27 blah'}, row{4, 'A87BNC30'}, row{5, 'XY40A3'}, row{6, 'TWFD'}, row{7, 'XYA53GH5JGV934'}, row{8, '7'}, row{9, nil} }; select FooStrings add{NthNum(IfNil(datacol,' '),1) MyNumber} with {IgnoreUnsupported = 'true'} order by {keycol}; keycol datacol MyNumber ------ -------------- -------- 1 XYZ40AB 40 2 WX32A 32 3 27 blah 27 4 A87BNC30 87 5 XY40A3 40 6 TWFD -1 7 XYA53GH5JGV934 53 8 7 7 9 <No Value> -1 Think how easy the operator can be modified if it's desired to return the minimun or maximum number etc. If you would like to see a coherent and easy to follow all t-sql solution that doesn't cobble together string functions into spaghetti code see: RAC - Are you Coordinated? But I think you'll agree the much preferred solution is in Dataphor ☺
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Saturday, December 20, 2008
Extracting numbers from a string
Saturday, December 06, 2008
Sql server dynamic crosstabs by Jeff Moden
This is the RAC version of Jeff Modens fine article on dynamic crosstabs for sql server at: Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs 2008/12/03 www.sqlservercentral.com/articles/cross+tab/65048/ The data is generated by the method outlined in the article. The table was populated with 1 millions rows - Basic crosstab. 2 secs in QA for S2005. Exec Rac @transform='Sum(SomeMoney) as SumMony', @rows='Someletters2', @pvtcol='(left(datename(mm,DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)),3)+~ ~+ cast(year(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)) as char(4))) as mthyr', @pvtsort='month(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0))', -- Sort pivot expression by an integer. @from='##JBMTest', @WHERE='SomeDate>=~Jan 1 2008 12:00AM~ AND SomeDate<~Jul 1 2008 12:00AM~', @rowtotalsposition='end',@racheck='y',@shell='n' Someletters2 Funct Jan 2008 Feb 2008 Mar 2008 Apr 2008 May 2008 Jun 2008 Totals ------------ ------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- AA SumMony 685.67 763.64 656.13 575.93 879.13 192.13 3752.63 AB SumMony 927.06 928.98 280.20 632.43 560.99 785.50 4115.16 AC SumMony 791.09 555.18 916.71 273.23 187.48 508.31 3232.00 AD SumMony 250.04 341.58 426.53 645.56 670.13 422.86 2756.70 AE SumMony 809.14 487.21 295.33 625.92 716.12 527.19 3460.91 . . ZY SumMony 776.32 682.98 326.17 677.69 546.87 926.54 3936.57 ZZ SumMony 532.75 500.92 277.93 636.40 607.05 553.43 3108.48 Totals SumMony 433997.25 383211.70 411913.12 411878.29 425431.07 409809.47 2476240.90 Here some additional bells and whistles are thrown in:) -- Executed in 26 secs in QA for S2005. Exec Rac -- The same transformed repeated twice for different purposes. @transform='Sum(SomeMoney) as SumMony & Sum(SomeMoney) as [% row]', @rows='Someletters2', @pvtcol='(left(datename(mm,DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)),3)+~ ~+ cast(year(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0)) as char(4))) as mthyr', @pvtsort='month(DATEADD(mm,DATEDIFF(mm,0,SomeDate),0))', @from='##JBMTest', @WHERE='SomeDate>=~Jan 1 2008 12:00AM~ AND SomeDate<~Jul 1 2008 12:00AM~', @rowtotalsposition='end',@racheck='y',@pformat='_pvtcols_',@shell='n',@translabel='Summary', -- Display min and max sum for each Someletters along with pivot (date) it occurred. -- The min and max are displayed in separate rows. Default is same row for all rowfunctions. @rowfunctions='min(SumMony) & max(SumMony)',@rowfunctionslabel='Min/Max',@displayrowfunctions='m', -- Running sum of pivot columns for each row from left to right. The pivot sum is followed by the -- run in each cell. @colruns='SumMony', -- The percentage of the pivot sum/[row total] displayed in a separate row. @cpercents='[% row] %only' -- a different transform alias to force a separate row. -- We could display in same row as sum (and column runs). Someletters2 Summary Min/Max Jan 2008 Feb 2008 Mar 2008 Apr 2008 May 2008 Jun 2008 Totals ------------ ------- ----------------------- ------------------- ------------------- -------------------- -------------------- -------------------- -------------------- ---------- AA SumMony min(192.13,Jun 2008) 685.67/685.67 763.64/1449.31 656.13/2105.44 575.93/2681.37 879.13/3560.50 192.13/3752.63 3752.63 max(879.13,May 2008) % row 18.3% 20.3% 17.5% 15.3% 23.4% 5.1% - AB SumMony min(280.20,Mar 2008) 927.06/927.06 928.98/1856.04 280.20/2136.24 632.43/2768.67 560.99/3329.66 785.50/4115.16 4115.16 max(928.98,Feb 2008) % row 22.5% 22.6% 6.8% 15.4% 13.6% 19.1% - AC SumMony min(187.48,May 2008) 791.09/791.09 555.18/1346.27 916.71/2262.98 273.23/2536.21 187.48/2723.69 508.31/3232.00 3232.00 max(916.71,Mar 2008) % row 24.5% 17.2% 28.4% 8.5% 5.8% 15.7% - AD SumMony min(250.04,Jan 2008) 250.04/250.04 341.58/591.62 426.53/1018.15 645.56/1663.71 670.13/2333.84 422.86/2756.70 2756.70 max(670.13,May 2008) % row 9.1% 12.4% 15.5% 23.4% 24.3% 15.3% - AE SumMony min(295.33,Mar 2008) 809.14/809.14 487.21/1296.35 295.33/1591.68 625.92/2217.60 716.12/2933.72 527.19/3460.91 3460.91 max(809.14,Jan 2008) % row 23.4% 14.1% 8.5% 18.1% 20.7% 15.2% - AF SumMony min(406.49,May 2008) 788.30/788.30 415.40/1203.70 605.56/1809.26 613.81/2423.07 406.49/2829.56 520.40/3349.96 3349.96 max(788.30,Jan 2008) . . ZY SumMony min(326.17,Mar 2008) 776.32/776.32 682.98/1459.30 326.17/1785.47 677.69/2463.16 546.87/3010.03 926.54/3936.57 3936.57 max(926.54,Jun 2008) % row 19.7% 17.3% 8.3% 17.2% 13.9% 23.5% - ZZ SumMony min(277.93,Mar 2008) 532.75/532.75 500.92/1033.67 277.93/1311.60 636.40/1948.00 607.05/2555.05 553.43/3108.48 3108.48 max(636.40,Apr 2008) % row 17.1% 16.1% 8.9% 20.5% 19.5% 17.8% - Totals SumMony min(383211.70,Feb 2008) 433997.25/433997.25 383211.70/817208.95 411913.12/1229122.07 411878.29/1641000.36 425431.07/2066431.43 409809.47/2476240.90 2476240.90 max(433997.25,Jan 2008) % row 17.5% 15.5% 16.6% 16.6% 17.2% 16.5% -
Tuesday, November 25, 2008
Linq to Sql as an opportunity to sql users
Instead of a dead end, Linq presents opportunities for sql experts. My comment orginally appeared in the www.sqlblog.com post: 'Continuing Industry Consolidation and Other Trends.' 1/16/08 by Kevin Kline http://tinyurl.com/5opxlz I think the Sun acquisition of MySql was a checkmate move. It was Oracle that attempted to slow down MySql with their acquisition of the InnoDB transaction engine. Now they have been paid back in spades. Now it's Sun vs. MS in a new competition for the minds and hearts of developers. The broad view is LINQ/net and Sql Server vs. java and MySql. This is not about databases per se but a new war based on the object/entity model inspired by LINQ. I don't see IBM and Oracle in this war. They will have to be content to battle amongst themselves in the old database wars. (I'll come back to LINQ:) As for 'Checking out the Competition' of course I too applaud Linchi. But honestly we're first 'now' recognizing the advantages of eyes wide open?:) This attitude should be a given for a professional. Perhaps better late than never, perhaps an airing of cheerleading vs.reality checking:) For most ansi features that MS has pre-annouced all one has to do to study them is read about them in Oracle or DB2 documentation (ie. OVER). And as Linchi pointed out it often goes the opposite direction for other types of features. This attitude contributes to the great divides that are common in the industry. And now we come to the 'dead-end routes' like LINQ. I take the opposite view you do. There's a compelling case to be made (I've commented on this matter on this site) that if there is a deadweight MS sees sql as it. LINQ is not just a piece of new technology, it's not just a local variable, it's a global one. LINQ is both an affirmation of the runtime environment for 'all' application development using an object model and a rejection of the sql interface. MS can live with the physical file structure (the idea of relational data, rows and columns) but they don't want to live with the sql interface for application development. MS explains this global move in terms of solving the historic impedance mismatch between two completely different environments. And they have picked their winner and at the same time the loser. The rows and columns abstraction now ends at physical storage. The object abstraction and LINQ will take up everything else. Sql server is now something quite different than it used to be. Almost all developmental changes in server will be based on servicing objects and quite probably at the expense of features associated with a furtherance of the relational model. Look at all the work on partitioned views in S2008. This lies at the heart of how LINQ will translate entity updates. LINQ is still in its enfancy. I would expect it to appear to many just like sql did when it was intially introduced in the early eighties. It will take time to get the matured version. What is truely ironic is I see no real argument in the sql community that LINQ represents a great opportunity for sql developers. MS is inventing a declarative language in the spirit of sql. Don't people see an opportunity to jump in and at least influence the design of the language? Or get involved in the LINQ translation/optimizations to sql. Over time as MS integrates LINQ deeper into server (returning entities) I can assure you the current translations will change:) Sql was most certainly not an implementation of the relational model. So sql folks shouldn't get hung up over this. The relational model would require the same strong typed runtime as net but MS is certainly not going there. But they are going to a place that sql skills can be used. And now Sun is going to go along with them. It's actually a small world if your eyes are open:)
Linq to Sql should be important to sql users
My plea for the sql server community to look closely at Linq. My comment orginally appeared in the www.sqlblog.com post: 'PASS presentation: What would you like to hear about?' 2/23/08 by James Luetkehoelter http://tinyurl.com/5lmj4m Your eccentric. Such people tend to be really bright. So that's kewl. Your passionate, so your highly motivated. That's very good. If you're 'over the top' that means your not afraid of taking risks! That's best of all. If all this is true you can take on the burden of tackling LINQ and the entity framework. Now I'm not talking about the 'how' of it. I'm sure there will be plenty of people presenting point and click slides. What I am talking about is the 'why' and 'what' of it. LINQ/EFM dwarfs everything else in terms of importance in S2008. It's a reflection of a movement in a different direction from what everyone is used to. It's also a reflection of a change in attitude. When I look for sql folks tackling the subject what do I find? Frighteningly little or nothing! Now let me say if you're willing to make the case that sql folks can simply ignore the subject, have at it:) If you even want to make the case that it simply represents an isolated piece of technology go for it. Some sql folk are waxing nostalgic about past 4GL/xbase languages when discussing LINQ. So it may be that many sql folks think it's quite the same idea and wonder why it's structure is different and possibly inferior to what was. Well LINQ is different, it's comparing apples and oranges, and it can't possibly be the same. But how can folks really get a grasp of such a comparison if they don't really understand the 'what' of LINQ. Trying to explain 'what' it is isn't easy especially to those sql folks who are not familiar with net or any other contemporary strongly typed object oriented runtime environment. I think MS is finding out that it's a challenge. Even trickier is to explain the 'why' of it. The motivation for it. Surely it didn't come out of thin air:) And the 'why' involves taking risks. You may frighten or alienate people:) The 'why' cuts to the heart of what most folks believe in. LINQ is a departure. It will have significant implications for what development is and will be like in the future. It will take a very unique person to put all these ideas together so they'll be really understood. Interestingly, all the answers about LINQ are right on the MS website. All that's required is to dig around and back thru the years to find them. With over eight years of development this shouldn't be surprising:) But how many sql folks have bothered to do this homework? From the looks of it very, very few. Presenting concepts is much harder than presenting code. It takes a very special kind of guy to connect the dots here:) I'd be happy to share with you an extensive MS list of articles/notes thru the years about the subject.
Linq to Sql vs. older 4GL attempts
Some thoughts on comparing Linq to older reporting languages. My comment orginally appeared in the www.sqlblog.com post: 'LINQ - Lessons that could have been learned from languages like Progress' 2/25/08 by Greg Low http://tinyurl.com/56powf >it's still very much an add-on rather than being fully baked into the language. I remain somewhat perplexed by just what you mean by this. By definition LINQ is 'burned in'. This is the achievement that MS has accomplished. To imply that it's not fully baked in is like saying a woman is somewhat pregnant. It is or it isn't, you are or you are not:) Either a table/entity or projection can be represented as a variable or it cannot. That all these things 'can' be a variable of some type referenced in the language is the whole point of it no? Your use of 'add-on' seems to imply that LINQ is something external to the language much like an external 4GL like Progress. I don't think this can be further from the truth. In your interview with Pablo Castro he referred to Progress as an example of a language having 'direct line of sight from language to database'. Wasn't he struggling here to convey the idea to sql folks of the fundamentally different natures of the two? To bring the 4GL idea into contemporary languages one shouldn't expect they are going to be similar. And you seem to be implying that LINQ is not as 'natural' as Progress. How could it be? If you first have to define a query variable (table) you certainly can't expect to start your variable with a Select/Find. You define and then perform an action right? In t-sql 'Select @MyNumber' only makes sense if your first declare @MyNumber. Is LINQ any different? And in the sense that 'declare @MyNumber int' is burned into t-sql, isn't 'var customers = from c in db.Customers' burned into C#? I do think sql users should listen to your interview with Pablo. It is proving difficult for MS folks to convey just what they have done to those outside of net. What is worse, misunderstanding it or ignoring it?:) Shouldn't sql folks fully understand why MS thinks it's such an improvement over sql? So far I think either they don't or simply don't care.
Linq to Sql: the 'what' and 'why' of it
The following comments concern 'what' is Linq (to Sql)/EF and the 'why' of it, what motivated MS to develop it. What does MS mean by 'beyond 'relational'? I also explore in what ways Linq, sql and the relational model are related to each other. How these technologies are connected to each other is quite a fascinating picture:) My following 5 comments orginally appeared on www.sqlblog.com in the post: 'Beyond Relational ???' 10/29/07 by Paul Nielsen http://tinyurl.com/686z6h Comment #1 There is the association of relational to mathemetics (set theory). So people criticize sql based on this point of view. Sql allows duplicates rows, doesn't require a table to have a key, dependencies based on ordinal position, is a poorly designed language etc. etc. These things really are critical but the real problem is the prevailing idea that relational is just a question of mathemetics. If it's just mathemetics then allowing duplicate rows is perceived as 'close enough'. All the objections from the set theory point view are not perceived as compelling enough to really question the validity of sql. IMO the real holes of sql have nothing to do with mathemetics. Rather it's the foundation, the computer science if you will, that set theory and relational algebra are embedded in. This point of view is unfortunately not prevalent in IT. What the hell do I mean by the computer science of the relational model? Well first, the set theory that relational depends on is not some special kind of set theory. There is only one set theory. In the same way there is only one computer science, there is no special kind of computer science. But sql has invented such a special computer science and this is the biggest flaw. What am I talking about?:) Consider this, here is a table variable: DECLARE @MyTableVar table( EmpID int NOT NULL primary key, OldVacationHours int, NewVacationHours int, ModifiedDate datetime); Here is a server table: create MyTable EmpID int NOT NULL primary, OldVacationHours int, NewVacationHours int, ModifiedDate datetime); Here's the key question. If @MyTableVar really is a variable then what is MyTable? In other words, @MyTableVar is to variable as MyTable is to ?????. If MyTable is persisted in the database what is it persisted as? What computer science term describes it? Well whatever the hell it is (a constant?) it certainly isn't a variable. And if it isn't a variable then end of ballgame, end of relational model. And what of @MyTableVar? Bol says 'A table variable behaves like a local variable.' and at the same time says 'Assignment operation between table variables is not supported.'. When is a door not a door?..when it's ajar:) Who the hell ever heard of a variable that doesn't support assignment? Who ever heard of a variable that doesn't support comparison? No one. Whatever @MyTableVar really is it sure as hell ain't a variable. In a relational db I should be able to assign the table @MyTableVar, all its rows, to MyTable: MyTable=@MyTableVar And I should be able to compare them. if MyTable=@MyTableVar then print 'All rows in MyTable are in @MyTableVar and all rows in @MyTableVar are in MyTable' else print 'Nope they're not equal' A relational db demands a table be a variable just like an integer variable. Sql simply does not support basic computer science for tables. Whatever a table is in sql it doesn't have a 'type' because computer science is computer science and a variable must be typed. The only way sql can recognize a table is by its name, not its type. This is why sql doesn't support relational division and why dynamic sql must be used so much. A table as a variable is a completely different animal than a table in sql. This is why the expressive power of a relational db is orders of magnitude greater than an sql db. Sql views and constraints are redefined relationally. The 'types' in Dates work: Databases, Types and the Relational Model, The Third Manifesto' 2007 is about the central importance of variables of a particular type (a table as one of many types) in a relational db. What a table as a variable means and its significance. It is really a basic computer science book. Ripping out the mathematics of relational theory (at least trying to copy it), ie. the syntax to join, union tables, without the computer science of relational has done all the damage. MS can't change sql server because they are caught in an crazy computer science. The difference in computer science between sql and net is the impedance mismatch they're trying address. But I'm afraid they still don't get the idea of a table as a variable. This is different than a table as a class. The anonymous type in Linq is distinctly different than a table type. So MS is doing the same thing IBM did forty years ago with the sql System-R. The damage is the difference between a pickup game in a playground and organized sports. You can draw up plays in the dirt but they don't quite work the same as those run in a stadium. We're still doing application development in the playground. Sometimes it works, sometimes it doesn't but we're not basing it on the science of any model. Sql is not a model of anything, it's an invention all its own. Close enough is only for horsehoes:) Maybe my blog will make more sense now:) Comment #2 Wherever you see the word 'relational' just substitute 'fog'. As in fog of war:) > But when you have guys like Don Chamberlin (co-inventor of SQL and > co-designer of XQuery) on your staff, I guess you can afford to > boast your XML prowess. He is revered in the sql world and reviled in the relational one. He was a lead designer of System-R, the prototype of all sql database systems. Those guys created a query language based on Codds description of basic relational operators like projection, union and join. But they did NOT implement the relational model Codd described. They just ripped out these constructs without regard for their meaningfulness in the entire relational model. So what you have today is nothing like the relational model as it was envisioned. (IT successfully marginalizes the huge difference and those that point it out:) And now comes 'beyond relational'. What does this phrase really mean to MS? They are more than willing to tell us. Aside from Jim Grays article/presentation, everyone should read the articles on this site, the 'Comega language': http://research.microsoft.com/Comega/ Especially this article: 'Unifying Tables, Objects and Documents' http://tinyurl.com/yq7c4f Here you'll find history repeating itself. MS, just like IBM did with System-R, has extracted relational operators out of the relational model and put them in an imperative object environment without any regard to relational theory. The great irony is that the extensions that MS added to net to realize projections of columns and xml within net is the foundation for a true relational model! But the compiler generated anonymous type of Linq while a variable is a different beast than the explicit variable that is a table type in the relational model. It's the relational variable that supports assignment and comparison as opposed to the Linq variable that's no where near as smart:) But each supports a 'variable' which is a major step up from sql. Had MS any idea of the friggin true relational model they would make a different kind of history. Talk about dumbing down. Talk about of only academic interest. Talk about relational fog (I should add that Alphora (Dataphor) recognized the ability of the object imperative environment to support the D relational language and implemented it. And it works:) Here is what Anders Hejlsberg, MS VS guru, and now the head of database technology has to say about the disconnect: InfoWorld Interview of Microsoft Distinguished Engineer Anders Hejlsberg 'Microsoft's Hejlsberg touts .Net, C-Omega technologies' June 10, 2005 http://www.infoworld.com/article/05/06/10/HNhejlsberg_1.html "So what we're looking at is really trying to much more deeply integrate the capabilities of query languages and data into the C# programming language. And I don't specifically mean SQL, and I emphatically don't mean just take SQL and slap it into C# and have SQL in there. But rather try to understand what is it expressively that you can do in SQL and add those same capabilities to C#." Anders Hejlsberg is microsofts version of Don Chamberlin at IBM. So what they have done is replace one flawed implementation of sql with another. And this is how they achieve efficiency in application development. Now that is unfriggin believable:) Well there's no free lunches. And I await to be enlightened on just how this environment will replace the concept of the logical relational model in solving business problems. I would say the real meaning of beyond relational is sideways. Comment #3 Consider the MS whitepaper: 'Microsoft SQL Server 2008 and Microsoft Data Platform Development' http://www.microsoft.com/sql/techinfo/whitepapers/sql_2008_dp.mspx Does anyone find it the least bit odd that an sql server technical article is all about VS, LINQ and the entity framework? At the expense of the logical relational model and the sql language. What MS means by 'beyond relational' is 'forget relational already':) Looking at sql server as if it was somehow an embodiment of relational theory is every bit a form of dumbing down as some silly utterance by some poor nitwit at MS. There never was and never will be any 'intent' by MS to offer a 'relational' database. Sql servers only intent now is to be responsive to its biggest customer, visual studio. And that team is as knowledgeable in relational databases as the server team. Not. Why does the community still view sql server thru an imaginary lense? Did you ever hear of somewhat pregnant? If you open the dumbing down door be prepared to greet all those who come thru:) Comment #4 There is no longer a great divide, a debate, an impedance mismatch. MS has issued their own Emancipation Proclamation. And as a result they no longer support the relation model as it is know to developers today. 'A Call to Arms' by Jim Gray, Microsoft Mark Compton, Consultant April 2005 http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=293 This paper is an invitation to embrace a new model. It's just as much 'A Farewell to Arms', an emancipation from the relational model which they are leaving behind. What does sql server look like in this new model? 'Interview with David Campbell' General Manager of Strategy, Infrastructure and Architecture of Microsoft SQL Server. May 14, 2007 http://tinyurl.com/6maseb Campbell says: "I believe the next major advance in Information Technology will come from addressing the gap between people and information." That gap is the relational logical model itself. Campbell continues: "The focus will move to the data itself rather than on the machinery used to manipulate it. We'll be less concerned with the plumbing and more concerned with data quality, data protection, and information production." "Most of the data services provided by SQL Server will be driven from a common data model. Whether you're creating a report, building an information cube, or integrating data from another system, you will be able to start from a common model of the key data entities such as 'customer', 'order', or 'prospect'." "Finally, fewer and fewer people will miss, (or remember), the 'open databases' sp_configure option..." The class replaces the table as the basic unit of work. VS replaces QA/SSMS as the interface for application development. There is no concept of relational anything in this object world. Sql constructs are independent of application development. The language of the relational model is replaced with the language of entities. There is no concept of a dba. MS is no longer in the database wars as we know it. They are trading 3rd place in that world for 1st place in another. And they now have the freedom to talk about this new world. It just sounds silly to those who have not left the old one. Ironically some were hoping for a new sub-language to further AD. Perhaps the lesson here is to be careful of what you wish for. I too was hoping they'd enter a new world but not the one they have chosen. Comment #5 > should we be concerned staying in the DB world long with the fear > that we become obsolete one day? Although I'm not an expert I can understand where you're coming from. It would be nice to get a clear and concise answer to where MS is going and what you should do about it. But there is no Oracle when it comes to MS. There is no one position paper, no one person that clearly lays out their five year plan and what it means to you. The experts here have enormous importance and influence in the db community. But they also have an enormous investment. How far can they be reasonably expected to go without putting themselves in an awkward position should they take a position that is not currently in line with company thinking? In the end it's a question of connecting the dots. You get a dot here a dot there. You have to do your homework. Study what they say and write and study what they offer. Sql server pros shouldn't neglect what's going on in VS and it's impact. If you study the company and the various technologies enough you should be able to draw your own picture. Think of it as the MS X-files:)
Sunday, November 23, 2008
Listing Missing Table Item
The following example is based on the post: microsoft.public.sqlserver.programming Friday, November 21, 2008 11:56 AM "T-SQL Comparing a String of Items to a Table of Items; Listing Missing Table Items" http://tinyurl.com/5dvc6o Here's what the future of application development will hopefully be like using a relational system. To program relationally you not only have to think in "terms of sets" (the declarative thing:) but you have to think "in terms of type" (the relation thing). Sql essentially doesn't concern itself with variables and types. They are essentially viewed as foreign thingies found in net. But in a relational system they are every bit as significant as understanding how to write an outer join. The following example illustrates the relational approach using the Dataphor system (which can play nicely with Sql Server ☺) create session table Products { product_key:Integer, part_no: String, options: list(String), //options is defined as a list type, not a string. price: Money, key{product_key} }; All tables, like numbers, are variables with an explicit 'table' type, the column names and their datatypes. And like integers they can be 'assigned' values. Tables are composed of 'rows'. For each row options is input as a 'list' of type string ({'<string>','<string>','<string>'..}). Products:= table { row{11 product_key, 'ASP-20J' part_no, {'Wiz Bang', 'Hopla Enhancer'} options, $10.00 price}, row{12, 'ASP-20R', {'Widget Holder','Wiz Bang', 'Hopla Enhancer'}, $12.00} }; create session table Options { option_key: Integer, product_key: Integer, option: String, key{option_key}, reference OptiontoProduct {product_key} references Products {product_key} }; Options:= table { row{5 option_key, 11 product_key, 'Wiz Bang' option}, row{6, 11, 'Hopla Enhancer'}, row{7, 12, 'Wiz Bang'}, row{8, 12, 'Hopla Enhancer'} }; Here are easy ways to find the missing product from the Options table. Using row in table. First a table of numbers (Numbers) with a single column N (from 0-10K) is used to create a row for each element in the options list bringing along the other Products columns. The row is constructed from this table to see if it's not in options. select ( Products times //Like an sql cross join. Numbers where N<options.Count() //create a row for each element in the list. {product_key,options[N] option} ) where not ( row{product_key product_key,option option} in (Options {product_key,option}) ) ; Using relational division. Because tables (table expressions) are variables one table can be tested to see if it's contained in another. select ( Products times Numbers where N<options.Count() {product_key,options[N] option} ) where not ( table{row{product_key product_key,option option}} <= (Options {product_key,option}) ) ; Using a left join. Test whether the table on the right has a matching row. select ( Products times Numbers where N<options.Count() {product_key,options[N] option} ) left join Options include rowexists //A special Boolean column for whether there is a match. where not rowexists {product_key,option} ; Instead of inputting a list directly, a delimited string can be converted to a list when input. Products:= table{ row{11 product_key, 'ASP-20J' part_no, ('Wiz Bang, Hopla Enhancer '.Split()) options, $10.00 price}, row{12, 'ASP-20R', (' Widget Holder ,Wiz Bang, Hopla Enhancer'.Split()), $12.00} }; Options:= table{ row{5 option_key, 11 product_key, 'Wiz Bang' option}, row{6, 11, 'Hopla Enhancer'}, row{7, 12, 'Wiz Bang'}, row{8, 12, 'Hopla Enhancer'} }; The queries are the same as above except for trimming each element of the list. select ( Products times Numbers where N<options.Count() {product_key,options[N].Trim() option} ) where not ( row{product_key product_key,option option} in (Options {product_key,option}) ) ; select ( Products times Numbers where N<options.Count() {product_key,options[N].Trim() option} ) where not ( table{row{product_key product_key,option option}} <= (Options {product_key,option}) ) ; select ( Products times Numbers where N<options.Count() {product_key,options[N].Trim() option} ) left join Options include rowexists where not rowexists {product_key,option}; All queries produce a result of: product_key option ----------- ------------- 12 Widget Holder Types eliminate violations of normal forms much like education eliminates ignorance ☺ Dataphor is a RAD tool, R(elational) A(ccelerator) D(evelopment). Visit dataphor at: www.dataphor.org
Friday, November 14, 2008
S(ecure) Q(uery) L(anguage)?
Concerning the thread: microsoft.public.sqlserver.programming Nov 10, 2008 "cascading deletes" http://tinyurl.com/6nwjmd For some reason I couldn't get my reply to post thru OE (it got thru via google though). Perhaps there's an MS filter for metaphors ☺ In any event any mature adult should be able to handle it. So here's my reply with a touch of creative writing ☺ 'Most women believe men think with their tool. And it's just as true in IT. Users model business problems in terms of the abilities of their db. The idea that modeling exists independent of ones db is a myth. It's not a question of seepage but of flooding. Modeling business problems in terms of the available sql server constructs is messy precisely because their immature and superficial to the task. The result is you turn away from the db and towards the do-it-myself model. You roll around in your own layer because you can't get layered by the db. It's ridiculous to write a join procedurally but when it comes to modeling it's perfectly acceptable to roll your own. Because the model equivalent of the join is so lacking and messy. The genie isn't going back in the sql server bottle. It's simply to far gone. That's why I advocate Dataphor. There the genie is in the join as well as the modeling. Use Dataphor and put your tool back where your head and shoulders are. You can still use sql server. But you aren't going to get tooled by it :)' www.dataphor.org www.beyondsql.blogspot.com Geoff Schaller wrote: > Andre. > I vote with Hugo here. We manage everything from code, not from TSQL in > SSMS or via some other mechanism so we generally have to code everything > (and that is not as difficult or as expansive as it sounds). Whilst > cascading referential integrity is "nice" from a simplicity point of > view, we've found that the act of deleting something (say an invoice) is > almost never a simple thing. There is reversal of stock levels, > rebalancing totals and if others are running reports when you thought > you wanted to do the delete, it gets messy. > The other thing is that we quite often have to delete the child entries > individually or prevent the parent from being deleted because a child or > two cannot be. Writing all that logic into a trigger and enforcing the > rollback is quite complex. I find code an easier way to manage and > maintain this. To add insult to injury my reply to a post on SQLServerCentral was hacked (edited). SQLServerCentral 11/10/2008 'Arrays in Stroed Prcoedure' http://tinyurl.com/5th5n4 My reply, as shown there under the name rog pike, was edited to read: 'An array is a 'type', something the archaic computer science of sql knows nothing about. You have to move to a 'real' relational system to find a list/array type. You'll find such adult computer science in Dataphor.' My orginal reply was a follows: 'Arrays are in sql server in the same sense as having sex by yourself which may account for the shortsightedness of so many sql mavens. An array is a 'type', something the archaic computer science of sql knows nothing about. You have to move to a 'real' relational system to find a list/array type. You'll find such adult computer science in Dataphor.' Is the site for mature adults or for the whole family? ☺ Just how much protection does sql and its users need? Is this a security or, better yet, an insecurity problem? ☺ Finally, I'll repeat here what I posted in the above thread: 'Apparently someone complained/reported something I wrote as being objectionable. They got their wish as it was magically extracted from the text. What was yanked, besides my chain, was a metaphor, albeit a vivid one, to drive a salient point home. Now I write for adults, I don't do child-speak very well. Nor do I have a predilection to only write drone-on-speak. So, if I can, I won't hesitate to use an adult metaphor to amplify a point in an industry that is usually tone deaf. God forbid IT encourage ability in something other than code or pixels. So if you are an adult, with a surname other than anonymous, please explain just what you found R or X rated. Mature adults usually confront conflicts thru the front door not the back one.'
Monday, September 08, 2008
Sorting a delimited string numerically
Sorting a delimited string by its numerical string parts. This is a possible solution to the problem presented in the article: 'T-SQL Sorting Challenge' By: Itzik Ben-Gan http://www.sqlmag.com/Article/ArticleID/100156/100156.html "You are given a table called t1 with a character string column called val. Each string in the val column holds a dot separated list of integers. Your task is to write a T-SQL solution that sorts the strings based on the integer values constituting the string segments. Note that the number of integers in each string may vary, and is only limited by the column type VARCHAR(500). Extra points if your solution will also support negative integers." So the problem is how to construct a sort expression that represents the positive and negative integers of the ids. This solution uses Dataphor with the data being stored in Sql Server 2005. Sample data The data is populated thru dataphor and persisted in the sql server northwind db. The table t1 in the article is table IZ here. The article uses an sql identity for the primary key id, here it is explicitly declared. create table IZ { id:Integer, val:String tags {Storage.Length='75'}, key{id} }; //Positive value strings. insert row{1 id,'100' val} into IZ; insert row{2 id,'7.4.250' val} into IZ; insert row{3 id,'22.40.5.60.4.100.300.478.19710212' val} into IZ; insert row{4 id,'22.40.5.60.4.99.300.478.19710212' val} into IZ; insert row{5 id,'22.40.5.60.4.99.300.478.9999999' val} into IZ; insert row{6 id,'10.30.40.50.20.30.40' val} into IZ; insert row{7 id,'7.4.250' val} into IZ; //Add negative values. insert row{8 id,'-1' val} into IZ; insert row{9 id,'-2' val} into IZ; insert row{10 id,'-11' val} into IZ; insert row{11 id,'-22' val} into IZ; insert row{12 id,'-123' val} into IZ; insert row{13 id,'-321' val} into IZ; insert row{14 id,'22.40.5.60.4.-100.300.478.19710212' val} into IZ; insert row{15 id,'22.40.5.60.4.-99.300.478.19710212' val} into IZ; Go directly to dataphor solution Go directly to a solution using dataphor and sql server t-sql Go directly to a solution using the Rac utility on sql server 2000 Go directly to a solution using the Rac utility on sql server 2005 ( Rac is a system of stored procedures and functions for sql server designed to simplify solving various data manipulation problems including dynamic crosstabs, complex running sums and ranking, string manipulations etc. )
Stepping thru the logic of the solution Use the dataphor Split operator to split the val string for each id into individual strings starting at Index 1 and going to the number of parts delimited by the period ('.'). Note that the string is converted to an integer. So we're dealing with numbers and not strings. select (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times //times is equivalent to an sql CROSS JOIN. (Numbers where N<10) //A table with a single column N, an integer from 0 to 800. where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} order by {id,Index} ; id Index StrNum -- ----- -------- 1 1 100 <- id 1 has only a single value. 2 1 7 <- id 2 has 3 values. 2 2 4 2 3 250 3 1 22 <- id 3 has 9 values. 3 2 40 3 3 5 3 4 60 3 5 4 3 6 100 3 7 300 3 8 478 3 9 19710212 4 1 22 . . . Now lets look at the same data but within each Index and within each Index ordered by the string as a number (StrNum). Remember all ordering is ascending. select (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} order by {Index,StrNum}; id Index StrNum -- ----- -------- 13 1 -321 <- id 13 has lowest overall number for 1st string part (Index 1). 12 1 -123 11 1 -22 10 1 -11 9 1 -2 8 1 -1 2 1 7 7 1 7 6 1 10 3 1 22 <- ids 3,4,5,14,15 have the same value (22) for Index 1. 4 1 22 5 1 22 14 1 22 15 1 22 1 1 100 <- id 1 has highest overall number for 1st string part (Index 1). . . . 3 8 478 <- Index 8 has the same string value for the 5 ids that have an 8th string part. 4 8 478 5 8 478 14 8 478 15 8 478 5 9 9999999 3 9 19710212 4 9 19710212 14 9 19710212 15 9 19710212 The strings as integers are nicely sorted within each Index over the ids. How can we represent the same ordering within each Index independent of the positive and negative numbers (and strings that indicate positive and negative numbers)? What about with a ranking. So lets rank the numbers within each Index. The combination of the dataphor ToTable, ToList and cursor operators will generate a rank (a column named sequence) that follows the specified cursor ordering. We order the cursor by Index,StrNum to get an ascending rank within each Index based on the StrNum values. The rank is column RowNum. select ToTable( ToList( cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum} order by {Index,StrNum}; Index id StrNum RowNum ----- -- -------- ------ 1 13 -321 1 <- id 13 has lowest value (-321) so id 1 gets lowest rank (1) within Index 1. 1 12 -123 2 1 11 -22 3 1 10 -11 4 1 9 -2 5 1 8 -1 6 1 2 7 7 <- ids 2 and 7 get a different rank for the duplicate values of 7 :( 1 7 7 8 1 6 10 9 1 3 22 10 <- all 5 ids get a different rank for the duplicate values of 22 :( 1 4 22 11 1 5 22 12 1 14 22 13 1 15 22 14 1 1 100 15 <- id 1 has highest value (100) so id 1 gets highest rank (15) within Index 1. . . . . 8 3 478 56 8 4 478 57 8 5 478 58 8 14 478 59 8 15 478 60 9 5 9999999 61 9 3 19710212 62 <- all 4 ids get a different rank for the duplicate values of 19710212 :( 9 4 19710212 63 9 14 19710212 64 9 15 19710212 65 ( Note that this rank is equilvant to the sql ranking function ROW_NUMER(). The rank could be obtained in sql using: ROW_NUMBER()OVER(ORDER BY [Index],StrNum) AS RowNum But the rank we want is really based on the sql RANK() function which accounts for duplicate/ties (of StrNum) by giving them the same rank. Therefore it's necessary in dataphor to use a join to append the correct ranks to the table. In sql the join isn't necessary, RANK() can be used directly on the table:ie. RANK()OVER(ORDER BY [Index],StrNum) AS Rank (See t-sql solution) For more on dataphor and sql ranking see: 'The Sql ranking OVERture' http://beyondsql.blogspot.com/2008/04/sql-ranking-overture.html ) If you haven't guessed it by now ☺ the idea is to create a string for each id based on the ranks which will be used to order the ids. But we have a problem because for duplicate values of a number we're getting different ranks. We want the 'same' rank for duplicate values since the same integer cannot be used to distinguish among the ids. We can remedy the different ranks for duplicate values by simply choosing the minimum rank (RowNum) for the value and assigning this rank to all ids. Also note that the ranks continue to ascend over the Indexs. This is ok because any numbers representing the ranks are ok if they correctly maintain the ordering of integer strings values within the Index. select ToTable( ToList( cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum} group by {Index,StrNum} add{Min(RowNum) Rank} order by {Index,StrNum}; Index StrNum Rank ----- -------- ---- 1 -321 1 1 -123 2 1 -22 3 1 -11 4 1 -2 5 1 -1 6 1 7 7 <- a rank of 7 can be assigned to the two ids (2,7) with a value of 7 for Index 1. 1 10 9 1 22 10 <- a rank of 10 can be assigned to all ids with a value of 22 for Index 1. 1 100 15 . . . 8 478 56 9 9999999 61 9 19710212 62 <- a rank of 62 can be assigned to all ids with a value of 19710212 for Index 9. What we now have is a table of unique Index/StrNum combinations with a unique rank for each combination. It's only necessary to join this table to the table of split strings (IZ) for all ids by Index and StrNum to properly assign the correct(ed) ranks. (As mentioned above this is the same rank that would be obtained using the sql RANK() function and ordering by Index,StrNum. And note that using the sql RANK() would eliminate the need to do a join in dataphor. Imagine dataphor with native sql like ranking operations ☺ ) Because the objective is to create a string to sort the ids we can't just use the numeric rank, we have to modify it for string ordering. Given two ranks of 7 and 11 if they are strings, '7' and '11', an ascending sort would have '11' come '7': '11' '7' This is the very problem the article is addressing! So we have to modify the the strings to have '7' come '11' ☺. We can modify the '7' by left padding it with '0'. So when we sort ascending we'll have the correct representation of of the true numeric order of the values: '07' '11' How much to pad a rank, how many '0's to insert, is the string length of the maximum rank generated. Because the ranks in dataphor keep ascending regardless of the cursor ordering, the maximum rank (ignoring duplicates) is the count of rows in the table. You could even make an educated guess based on the amount of data and use that ☺. Left padding the string rank (RankStr) based on a maximum length of 2 we now have all the data to finally construct a sorting column for the ids. var UnpackedStrings:= //This variable holds all the split data and will be used in the select query. ToTable(ToList(cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum}; var LengthofMaxRank:=2; select (UnpackedStrings {id,Index,StrNum}) join //Join the unique ranks to all split data. This is a natural join (on Index/StrNum). //Create a left padded string (RankStr) from the numeric rank. (UnpackedStrings group by {Index,StrNum} add{Min(RowNum) Rank}) {id,Index,StrNum,Rank,PadLeft(ToString(Rank),LengthofMaxRank,'0') RankStr} order by {Index,Rank}; id Index StrNum Rank RankStr -- ----- -------- ---- ------- 13 1 -321 1 01 <- ranks 1-9 are left padded in the rank string (RankStr). 12 1 -123 2 02 11 1 -22 3 03 10 1 -11 4 04 9 1 -2 5 05 8 1 -1 6 06 2 1 7 7 07 7 1 7 7 07 6 1 10 9 09 3 1 22 10 10 4 1 22 10 10 5 1 22 10 10 14 1 22 10 10 15 1 22 10 10 1 1 100 15 15 . . . . . 3 8 478 56 56 4 8 478 56 56 5 8 478 56 56 14 8 478 56 56 15 8 478 56 56 5 9 9999999 61 61 3 9 19710212 62 62 4 9 19710212 62 62 14 9 19710212 62 62 15 9 19710212 62 62 The id sort column can now be formed by concatenating, using the Concat operator, the RankStr within each id in the order of the rank (either Rank, Index or RankStr). This is easy to see by ordering the above data (table) by id,Rank. The ascending order of Index, Rank and RankStr all reflect where an id lies in value (RankStr) relative to the other ids. The sort expression will be column SortStr. id Index StrNum Rank RankStr -- ----- -------- ---- ------- 1 1 100 15 15 2 1 7 7 07 2 2 4 16 16 2 3 250 30 30 3 1 22 10 10 3 2 40 19 19 3 3 5 24 24 3 4 60 33 33 3 5 4 38 38 3 6 100 49 49 3 7 300 51 51 3 8 478 56 56 3 9 19710212 62 62 The complete dataphor solution var UnpackedStrings:= //Variable that holds all split data and ranks the split strings //numerically within each Index ordered by the numeric string value. ToTable(ToList(cursor( ( (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum} ) order by {Index,StrNum}))) {Index,id,StrNum,sequence+1 RowNum}; var LengthofMaxRank:= Length(Count(UnpackedStrings).ToString()); //A string length used to left pad the rank strings for //a correct string sort (will be 2 here). select IZ join //Natural join of input table IZ to sort column expression by id. ( //Join unique ranks to all split strings. ( (UnpackedStrings {id,Index,StrNum}) join // Adjust ranks to be unique for each Index/StrNum, duplicate values should get same rank. (UnpackedStrings group by {Index,StrNum} add{Min(RowNum) Rank}) //Left pad rank string with '0' (RankStr) for sorting string correctly. Add empty //string ('') to be used as a delimiter in concatenating the ranks. {id,Index,StrNum,Rank,PadLeft(ToString(Rank),LengthofMaxRank,'0') RankStr,'' Del} ) adorn {key{id,Rank}} //Sql uses physical hints, in dataphor you use logical ones. //Form the sorting expression SortStr to sort ids by concatenating the string ranks //in the order of any of the ranking columns or Index. group by {id} add{Concat(RankStr,Del order by {id,Rank}) SortStr} ) order by {SortStr}; //The object of the exercise, sort the ids by SortStr to get the correct //numerical order of val. id val SortStr -- ---------------------------------- ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 2 7.4.250 071630 7 7.4.250 071630 6 10.30.40.50.20.30.40 09182932434650 14 22.40.5.60.4.-100.300.478.19710212 101924333844515662 15 22.40.5.60.4.-99.300.478.19710212 101924333845515662 5 22.40.5.60.4.99.300.478.9999999 101924333847515661 4 22.40.5.60.4.99.300.478.19710212 101924333847515662 3 22.40.5.60.4.100.300.478.19710212 101924333849515662 1 100 15 Solution using the sql server RANK() function with a pass thru query The solution can be made more compact using the sql RANK() function since dataphor doesn't have a direct equivalent ranking operation. Since sql server can only access persisted tables (and views) and not dataphor expressions (we can't just stick in any dataphor table expression in a pass thru query) we'll create a persisted table to hold all the split data. create table IZSqlRanks //The table will be created in sql server. { id:Integer, Index:Integer, StrNum:Integer, key{id,Index} }; Do the splitting in dataphor (because it knows the difference between a string and a list ☺ ) and then 'assign' the resulting table to the persisted table IZSqlRanks. Relational databases support this kind of assignment for all variables including tables. And all tables in dataphor are variables. (To my sql friends this makes all the difference in the world ☺ ) IZSqlRanks:= (IZ add{val.Split({'.'}) StrList,val.Split({'.'}).Count() StrListCnt}) times (Numbers where N<10) where N<StrListCnt {id,N+1 Index,ToInteger(StrList[N]) StrNum}; //Use a t-sql passthru query to take advantage of the sql RANK() function. The //resulting table will be treated like any other table (expression) in dataphor. //Left pad the string rank (RankStr) for sorting purposes. We're using a total //string length of 2 here so single digit ranks will be padded with a leading '0'. select IZ join ( SQLQuery('SELECT id,[Index],StrNum,RANK()OVER(ORDER BY [Index],StrNum) AS Rank FROM IZSqlRanks') //Use the sql result as if it was a native dataphor expression. {id,Index,StrNum,Rank,PadLeft(ToString(Rank),2,'0') RankStr,'' Del} adorn {key{id,Rank}} //Metadata (a key) pertaining to the table expression. //This key will be efficiently used by the Concat operation. //Concatenate the rank strings for each id to be used as the sort order for ids. group by {id} add{Concat(RankStr,Del order by {id,Rank}) SortStr} ) order by {SortStr}; id val SortStr -- ---------------------------------- ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 2 7.4.250 071630 7 7.4.250 071630 6 10.30.40.50.20.30.40 09182932434650 14 22.40.5.60.4.-100.300.478.19710212 101924333844515662 15 22.40.5.60.4.-99.300.478.19710212 101924333845515662 5 22.40.5.60.4.99.300.478.9999999 101924333847515661 4 22.40.5.60.4.99.300.478.19710212 101924333847515662 3 22.40.5.60.4.100.300.478.19710212 101924333849515662 1 100 15 Solving the problem on sql server 2000 with the Rac utility The Rac solution follows the same logic as the dataphor and sql methods. The 1st Rac execute creates the Index column, from 1 to N, for every string part (delimited integer) an id has. The 2nd Rac execute creates a rank over the ids based on the integers (string parts) within each Index. The rank Rac generates is equivalent to the sql DENSE_RANK() function. This rank, like RANK(), gives duplicate values the same rank but, unlike RANK(), does not take the duplicate ranks into account when generating the next rank. RANK() skips ahead based on ties while DENSE_RANK() consecutively numbers the ranks. Both types of ranks give the same correct sort result for the ids. The 3rd Rac execute concatenates the left padded rank strings for each id and returns the ids sorted by them, correctly :) Note that Rac is called recursively twice. Exec Rac @split='[position]', -- Rac splits each val string from left to right based on a period ('.'). @rows='id & [position]',-- Rac keeps the position each new string part starts in. @pvtcol='val', -- the target column of the split operation. @from='IZ', -- use a counter to generate the Index column (from 1-N) that indicates the individual string part in a val. @separator='.',@rowcounters='id{[Index]}',@counterdatatype='int', @defaults1='y',@rowbreak='n',@racheck='y',@shell='n', @select=' SELECT 1*id AS id,[Index],CAST([1] AS INT) AS StrNum INTO #J1 FROM rac Exec Rac @transform=~_dummy_~, @rows=~[Index] & StrNum & id~, @pvtcol=~Report Mode~, @from=~#J1~, @defaults1=~y~,@rowbreak=~n~,@racheck=~y~,@shell=~n~, /* use a Rac counter to rank the integer string parts within each Index (column Rank).*/ @rowindicators=~StrNum{Rank}_overtable_~,@counterdatatype=~int~, @select= ~if object_id(~~##J2~~) is not null drop table dbo.t1 SELECT 1*id AS id,1*StrNum AS StrNum,1*[Index] AS [Index],1*Rank AS Rank, /* left pad single digit string ranks for proper character sorting.*/ REPLICATE(~~0~~,2-DATALENGTH(CAST(Rank AS VARCHAR(2)))) + CAST(Rank AS VARCHAR(2)) AS RankStr INTO ##J2 FROM rac~ /* concatenate the string ranks (RankStr) within each id into column SortStr.*/ Exec Rac @transform=~Max(RankStr) as RankStr~, @rows=~id~, @pvtcol=~Rank~, @from=~##J2~, @defaults1=~y~,@racheck=~y~,@shell=~n~,@cutpvt=~y~, @concatenate=~RankStr~,@separator=~~,@stringname=~SortStr~, /* return the ids sorted by the concatenated string ranks.*/ @select=~SELECT IZ.id,val,SortStr FROM IZ JOIN rac ON IZ.id=rac.id ORDER BY SortStr DROP TABLE ##J2~' id val SortStr ----------- ------------------------------------ ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 7 7.4.250 071116 2 7.4.250 071116 6 10.30.40.50.20.30.40 08121517202326 14 22.40.5.60.4.-100.300.478.19710212 091314181921272830 15 22.40.5.60.4.-99.300.478.19710212 091314181922272830 5 22.40.5.60.4.99.300.478.9999999 091314181924272829 4 22.40.5.60.4.99.300.478.19710212 091314181924272830 3 22.40.5.60.4.100.300.478.19710212 091314181925272830 1 100 10 More on Rac @ www.rac4sql.net Solving the problem on sql server 2005 with the Rac utility The Rac solution in S2k5 is similar to the one on S2K (go there for more info). But it's simpler since the split string parts (integers) can be ranked directly using a dense rank function not available in S2k. This eliminates a recursive call to Rac so here it's executed twice instead of three times (in S2k). Exec Rac @split='[position]', @rows='id & [position]', @pvtcol='val', @from='IZ', @separator='.',@rowcounters='id{[Index]}',@counterdatatype='int', @defaults1='y',@rowbreak='n',@racheck='y',@shell='n', -- the DENSE_RANK() function, available in S2k5, is used to rank the integer string parts -- and is left padded. @select=' SELECT id,[Index],StrNum,Rank, REPLICATE(~0~,2-DATALENGTH(CAST(Rank AS VARCHAR(2)))) + CAST(Rank AS VARCHAR(2)) AS RankStr INTO #J1 FROM (SELECT 1*id AS id,[Index],CAST([1] AS INT) AS StrNum, DENSE_RANK()OVER(ORDER BY [Index],CAST([1] AS INT)) AS Rank FROM rac ) AS A Exec Rac @transform=~Max(RankStr) as RankStr~, @rows=~id~, @pvtcol=~Rank~, @from=~#J1~, @defaults1=~y~,@racheck=~y~,@shell=~n~,@cutpvt=~y~, @concatenate=~RankStr~,@separator=~~,@stringname=~SortStr~, @select=~SELECT IZ.id,val,SortStr FROM IZ JOIN rac ON IZ.id=rac.id ORDER BY SortStr~' id val SortStr ----------- ------------------------------------ ------------------ 13 -321 01 12 -123 02 11 -22 03 10 -11 04 9 -2 05 8 -1 06 7 7.4.250 071116 2 7.4.250 071116 6 10.30.40.50.20.30.40 08121517202326 14 22.40.5.60.4.-100.300.478.19710212 091314181921272830 15 22.40.5.60.4.-99.300.478.19710212 091314181922272830 5 22.40.5.60.4.99.300.478.9999999 091314181924272829 4 22.40.5.60.4.99.300.478.19710212 091314181924272830 3 22.40.5.60.4.100.300.478.19710212 091314181925272830 1 100 10 More on Rac @ www.rac4sql.net
Wednesday, July 16, 2008
Demoralization by trigger
In an interesting blog by Conor Cunningham, 'The Trouble with Triggers', he says: "The problem with this area is that there is a great temptation to think about databases procedurally, as you would with a programming language like C++ or C#. You can write code that looks like a procedural function call and have it get called for each insert into table! Before you know it, non-database programmers are checking in code to your production sysem. Suddenly, your application grinds to a halt because a trigger plan has no index or is poorly designed. Databases are great tools for working on SETS of rows." In another related blog, 'Triggers...Evil?', there is this comment with an insightful Freudian slip: James Luetkehoelter said: >Some sorts of demoralization lend themselves to triggers... I would say this hits the nail on the head. I could understand a developer getting a case of depression triggered by Conors article. Triggers were 'implemented' to work efficiently on tables (sets) not on rows. The principle that's operating here is that how something was implemented to be most effective is the basis for what's best in application development. Are you kidding me, has everyone gone nuts? ☺ Because triggers don't consider a row as a primary concept 'functional' programmers, application developers, must 'unlearn' their database contrarian views. This is Celkos 'them' vs. 'us' nonsense. Never mind that the real subject is application development and possibly a theory that would best serve it, the basis for key concepts is what a bunch of programmers did for a MS product manager. Talk about the tail wagging the dog ☺ Not only is the absence of a 'row' type or at least concept antithetical to a relational dbms but it's central to application development. Perhaps to even the score MS decided developers should learn entities and unlearn sql entirely, LINQ. Or perhaps we'll get a hole new science of application development based on what works fastest ☺
Sunday, July 06, 2008
An opposite constraint
This article discusses an issue raised in the thread: microsoft.public.sqlserver.programming Sunday, June 29, 2008 'Storing a collection of lines' http://tinyurl.com/56dpd4 Give two columns in a table suppose you want to eliminate the opposite data where there's no mathematical or logical relationship between the columns. For example, consider a trip between two cities. It's equally likely a trip could start and end in either direction. If the table already has: column A column B -------- -------- NEW YORK CHICAGO We want to prevent the opposite from being entered: column A column B -------- -------- CHICAGO NEW YORK If the table has: column A column B -------- -------- CHICAGO NEW YORK We want to prevent the opposite from being entered: column A column B -------- -------- NEW YORK CHICAGO Because there's no relationship between the columns an sql check constraint can't be used. But in Dataphor a simple transition constraint can be used. The Opposite constraint simply checks if the opposite data for columns A and B already exists in the table. If it does the current row is rejected. The constraint works the same way for a single insert as it does for inserting multiple rows (a table). For example: create session table MyTable { A:String,B:String,C:String,key{A,B} }; alter table MyTable { create transition constraint Opposite on insert //The current (row) values for columns A and B are accessed //by prefixing each with 'new', ie. new.A, new.B . not exists (MyTable {A X,B Y} {Y A,X B} where A=new.A and B=new.B) tags //A custom error message can be written using the current //row values (new.A, new.B). { DAE.Message = "'For A: ' + new.A + ' and B: '+ new.B + ' there is a opposite, A: ' + new.B + ' and B: ' + new.A " } }; These insert succeed: insert row{'NEW YORK' A,'CHICAGO' B,'1c' C} into MyTable; insert row{'CALIFORNIA' B,'TEXAS' A,'1d' C} into MyTable; Inserting an opposite will fail and the custom error message will be raised: insert row{'NEW YORK' B,'CHICAGO' A,'1e' C} into MyTable; "For A: CHICAGO and B: NEW YORK there is a opposite, A: NEW YORK and B: CHICAGO" insert row{'CALIFORNIA' A,'TEXAS' B,'1d' C} into MyTable; "For A: CALIFORNIA and B: TEXAS there is a opposite, A: TEXAS and B: CALIFORNIA" Given that the table contains the data "NEW YORK" (A) and "CHICAGO" (B), inserting the following rows as a table will fail: insert table{ row{'DENVER' A,'BOSTON' B,'1c' C}, row{'RENO' B,'MIAMI' A,'1d' C}, row{'CHICAGO' A,'NEW YORK' B,'1e' C} } into MyTable; "For A: CHICAGO and B: NEW YORK there is a opposite, A: NEW YORK and B: CHICAGO. Note that the primary key constraint will eliminate the same A and B cities from being entered twice but entering the opposite cities does not violate it. That's what the Opposite constraint is for. The Opposite constraint is much simpler than an sql solution using triggers.
A one-one requirement constraint
The issue of a one-one requirement, a new order inserted must be accompanied by a detail row (insert), was raised in the thread: microsoft.public.sqlserver.programming Thursday, June 26, 2008 'update joined tables' http://tinyurl.com/69hh5c While this is a non-trivial exercise in sql it's quite easy in the relational system Dataphor. For example: create session table SOrders { order_nbr:Integer, A:Integer, key{order_nbr} }; create session table SOrderDetails { order_nbr:Integer, sku:Integer, B:Integer, C:String, key{order_nbr,sku}, reference Details_SOrders{ order_nbr } references SOrders { order_nbr } }; This constraint makes a detail row a requirement for a new order. The user defined message will be displayed anytime the constraint is violated. alter table SOrders { create transition constraint MustHaveOrderDetails on insert exists(SOrderDetails where order_nbr=new.order_nbr) tags { DAE.Message = "'A detail row with order#: ' + ToString(new.order_nbr) + ' is required'" } }; Inserting just a new order will violate the constraint: insert row{1 order_nbr, 14 A} into SOrders; Violation of constraint "MustHaveOrderDetails", "A detail row with order#: 1 is required." Inserting data can easily be done via a view. create session view VSOrderDetails SOrders join SOrderDetails;//A natural join on order_nbr. The MustHaveOrderDetails constraint works for the insertion of a single row or a table. insert row{1 order_nbr, 1 sku, 14 A, 5 B, 'Joe' C} into VSOrderDetails; delete SOrders; Insert into the tables thru the view. insert table { row{1 order_nbr, 1 sku, 14 A, 5 B, 'Joe' C}, row{1,2,9,23,'Steve'}, row{2,3,34,2,'Larry'} } into VSOrderDetails; select SOrders; order_nbr A --------- -- 1 14 2 34 select SOrderDetails; order_nbr sku B C --------- --- -- ----- 1 1 5 Joe 1 2 23 Steve 2 3 2 Larry This rather simple example shows off the much higher level of abstraction offered to developers with a relational system like Dataphor over sql. In sql only one table may be updated with a view. In Dataphor there is no concept of updating multiple tables. There is only the idea of updating a view/expression which is a table. It's the system responsibility to resolve a view to its base tables given the constraints in the schema. The user need only realize a table is being used for data modification and it's irrelevant how that table was constructed as well as the number of tables in its definition.
Friday, July 04, 2008
Justifying an Sql Cop Out
This is my response to Joe Celko in the thread: microsoft.public.sqlserver.programming Thursday, June 26, 2008 'update joined tables' http://tinyurl.com/69hh5c Joe invokes the 'NP complete' argument as the basis for why sql can't handle complicated modeling issues. The specific modeling issue here concerns a schema where there are multiple paths between tables due to cascading updates. My view is the NP complete argument is essentially a cop out. The same schema with multiple cascading updates that is rejected by sql server is perfectly acceptable in Dataphor (along with associated views involving multiple tables that can resolve data to the base tables). (I suggest reading thru the whole thread, it's interesting ☺) My response in the thread: What do we call a man who demurs to an obstinate woman on every decision? We call him a whimp, a guy who doesn't think it's worth the effort to present his counter point because he's convinced it's impossible to change her mind. The bottom line is, if it's so hard to change her mind why even bother. And an easy out for taking any responsibility for control of the outcome. Here you're using the idea of an NP complete problem as a crutch for the mess sql has left developers in. Woman may be an NP complete problem for man but it doesn't follow that given a relationship all men are whimps. Just because there isn't a universal and instantaneous quick fix technique to change a woman's mind in any situation doesn't bum out all men. Some guys are inventive and creative and come up with techniques that will work at least in some situations. Isn't that what lying, pleading and begging are for?:) NP complete system problems are brick walls not because there's no way to solve them but because there's no good quick fix from a systems point of view. But that doesn't mean that dbms should whimp out on them. You're using NP to suggest an all or nothing game. Since no efficient computational scheme exists to cover all situations where referential integrity involving cascading updates comes into play then sql is going take its ball and just go home. This is at most nonsense, at the very least no sense. You're setting poor sql up as the victim here and whining about Petri Nets! The perfect solution doesn't exist, so what. Sql is/was just being damn lazy. All they had to do is talk to a bunch of guys who have been married for twenty years to get a clue:) Like they couldn't use some heuristics or approximations? Nope they just whimped out. Same with views. Sql is the victim of a yet to be found super quick universal solution to updatability. This is simply an sql crutch for abandoning the idea totally! They couldn't be creative and cover some percentage of possible expressions that could be resolved to base tables? Apparently not. So sql gets away with only being able to update a single table and sticks a huge inconsistency between a table and view/expression in the face of developers. Here's an sql server example that has multiple cascading update paths: -- Updating order_nbr in TableA cascades to order_nbr in TableB and TableC. create table TableA ( order_nbr Integer, A Integer, primary key(order_nbr) ) ; --Updating sku in TableB cascades to sku in TableC. create table TableB ( order_nbr Integer, sku Integer, B Integer primary key(order_nbr,sku), CONSTRAINT Details_TableB foreign key (order_nbr) REFERENCES TableA (order_nbr) ON UPDATE CASCADE ) ; -- TableC has a dependency/path on TableA and TableB. create table TableC ( order_nbr Integer, sku Integer, ID Integer, C char(1), primary key (order_nbr,sku,ID), CONSTRAINT Details_TableC1 FOREIGN KEY (order_nbr) REFERENCES TableA (order_nbr) ON UPDATE CASCADE, CONSTRAINT Details_TableC2 FOREIGN KEY (order_nbr,sku) REFERENCES TableB (order_nbr,sku) ON UPDATE CASCADE ) ; Now is this a non-trival problem for a dbms. Yes it is. Is it insolvable in terms of solution and efficieny? Of course not. But when you try to create TableC you're told by sql server that it can't do it: "Introducing FOREIGN KEY constraint 'Details_TableC2' on table 'TableC' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints." Game over. Anything other than the most trivial path is outside the scope of the 'system' to handle. Users are forever told the sql game is 'what' not 'how'. But when it's convenient and expedient you turn the idea around and say 'how' can this be done when there's no known (NP complete) solution. Nice sophistry!:) Now this same DDL, with some modification, is perfectly fine in Dataphor using sql server to persist the tables. And using views, such as TableA join TableB, is also fine. The views will resolve data to the tables and the update cascades work as expected. Do you really think that this problem is NP complete for the Sql Server team but not for Dataphor? Give me a break:) Users should understand that because sql doesn't try to solve complex modeling issues and doesn't attempt to resolve views/expressions with multiple tables that this doesn't mean these things can't be solved. Dbms can be built so these things are handled by the 'system' and the user isn't left twisting in the wind making a mess in the front end. Will something like Dataphor be able to handle every conceivable combination of cascades between tables along with other types of constraints? Of course not, but it will handle a lot of tricky schemes that sql can't. Is Dataphor capable of resolving (updatability) every view/expression to the base tables? Of course not, but it will handle a large chunk of them. And in each case these things are transparent to the user. It would be nice to have a little honesty about this. Instead of creating straw men and red herrings just tell it like it is. The framework/foundation of sql is just not able to handle these issues. This dye was cast a long time ago (and yes sql was lazy on a lot of fronts). And there's little hope of sql addressing these things in the future. This certainly doesn't make sql irrelevant. It's a 'right tool for the job' world. I think sql folks can handle it. After all, not all sql users are whimps:)
Wednesday, May 14, 2008
Dataphor - Splitting a string into fixed sized word segments
This is a solution using Dataphor to the problem in the post: comp.databases.ms-sqlserver May 9, 2008 'SQL Statement Problem : Separate long varchar into word seqments 35 chars long' http://tinyurl.com/42vty5 The op states the problem as: 'I have a table of names which can be very long. These names get printed on envelopes. The problem is the envelope can only hold 35 characters per line. I have to divide into 4 lines at most. So I need to separate these long varchars into segments, no longer than 35 characters but preserving whole words.' Data is stored in MS Sql Server 2005. Here is some sample data stored in table SMerge: -- Tsql DDL statements to create the SMerge table in a sql server database. create table SMerge (ID Int primary key,InStr varchar(250)) go insert SMerge values (1,'BIOLOGICAL SURVEY FOUNDATION OF ONTARIO HAPPY IF SOMEONE CAN HELP SOLVE THIS SQL PROBLEM') insert SMerge values (2,'I SURE HOPE I CAN GET THIS THING TO WORK RIGHT') insert SMerge values (3,' What will happen with a big string full of all kinds of names and addresses including some goobly goop. Will it hold up ?') This is a fairly simple iterative solution. It illustrates various dataphor programming capabilities including set based, cursor and procedural. I've relaxed the requirement of at most 4 lines. Given a maximum line size the batch iterates over each input string creating lines not exceeding the maximum size until all words in the string are assigned to a line. There is no limit to the number of lines allowed (although there certainly could be). // Start of batch if not ObjectExists("WTable") then Execute( "create session table WTable //Temporary table used in batch. { StrID:Integer, //Input string id. Str:String, //An individual string, ie. a word, from splitting. Seq:Integer, //Position/sequence of word from split. Interval:Integer nil {default nil}, //The line number the word will eventually belongs to. StrLen:Integer, //Length of word (Str). RunningSumStrLens:Integer nil, //Running sum of word lengths within a line. LenCheck:Boolean {default false}, //Whether or not a word (Str) has been assigned to a line. key{StrID,Seq} };" ) else Execute("delete WTable;"); var MaxLineLength:=35; //Max length of line desired. var LCheck:Boolean; // Holds boolean (t/f) of adding string to line, // whether line is within max line size. var MaxSeq:Integer; //Holds max Seq for ID and line (interval). // SMerge is the sql server table. var LCursorSqlTable:=cursor(SMerge order by {ID}); // Loop over all rows in the sql table splitting each input string (InStr). while LCursorSqlTable.Next() do begin // Error check. If for any ID the length of any word (Str), via Split, exceeds the max line // size then eliminate the ID from the result. This also insures no endless cursor loops. if exists( ToTable(LCursorSqlTable.Select().InStr.Split({' '}),'Str','Seq') where Length(Str)>MaxLineLength ) then continue; //Skips below code and gets another input string (InStr). // Insert the split string into work table WTable. Split operator splits using blank as // the delimiter. A sequence number (Seq) is also generated which is an integer indicating // the position (from 0 to N) from left to right of each word (Str) in the input string (InStr). // We also compute the length each word (Str). This is stored in column StrLen. insert ( ToTable(LCursorSqlTable.Select().InStr.Split({' '}),'Str','Seq') add{Length(Str) StrLen, LCursorSqlTable.Select().ID StrID, nil as Integer RunningSumStrLens} where (StrLen>0) //Eliminate extra spaces. We could do all sorts of other //checks and manipulations here as well. ) into WTable; var LInterval:=0; //Line(Interval) within an ID (InStr). var LRunningSumStrLens:Integer; MaxSeq:=-1; //Start current ID with Seq>-1, ie. include all words (Seq starts at 0). // Iterate over WTable. Get a running sum of word lengths. Each iteration represents a new // line based on comparing the running sum of word lengths to the max line size. Mark words // in rows of WTable as either true (part of new line) or false (to be assigned in a subsequent // iteration). As long as there are words to be assigned to a line the iteration (loop) // continues. The loop stops when all words for an ID have been optimally assigned to a // line (Interval). In other words loop stops when all values of LenCheck are true because // each word has been assigned to a line. while exists(WTable where (StrID=LCursorSqlTable.Select().ID) and (Seq>MaxSeq)) do begin LInterval:=LInterval+1 ; LRunningSumStrLens:=0; //Initialize running sum for word lengths to 0 for a new line. // Use WTable eliminating prior assigned words. We don't need these rows, we need rows // representing words that still need to be assigned to a line. Seq>MaxSeq gets these rows, // it gets all rows (words) for an ID that haven't yet been assigned to a line (Interval). var LCursor:= cursor( WTable where ((StrID=LCursorSqlTable.Select().ID) and (Seq>MaxSeq)) order by {StrID,Seq} capabilities {Navigable, Searchable, Updateable}); while LCursor.Next() do // For all remaining words that aren't yet assigned to a line(Interval) for the current ID // get a running sum of word lengths and check that the run value is <= max line size. begin // If it's the start of a new line we only want the running sum to be the length of the word, // if it's after the 1st word we add the previous word lengths, which includes a 1 for a space // between words, and the length of the current word. LRunningSumStrLens:= if LRunningSumStrLens=0 then LCursor.Select().StrLen else LRunningSumStrLens + 1 + LCursor.Select().StrLen; // Check if the word, as part of the running sum, is within the maximum line size. LCheck:=LRunningSumStrLens<=MaxLineLength ; // If the current word length, when added to the running sum of lengths, is greater than the // max line size there's no more point in staying within the LCursor loop. The line(Interval) // has been determined using prior words so break out of this loop and start forming a new // line(Interval) with as yet unassigned words (where LenCheck=false) with a new running sum. if not LCheck then break; // Update LenCheck to true. The current word fits on the current line as tested by comparing // the running sum (with the current word length) to the max line size. We can now ignore // this word (row) in assigning subsequent (unassigned) words to a line. Remember that the // default value of LenCheck is false. So we don't have to update words (set LenCheck=false) // as false since this is the default. And we'll get those words (where LenCheck=false) on // the next iteration of the loop. LCursor.Update(row{LRunningSumStrLens RunningSumStrLens,LCheck LenCheck,LInterval Interval}); end; //Lcursor, looping over WTable for an ID until all words are assigned to a line. // Find max Seq for the current ID where the word has been assigned to a line (LenCheck=true). // The next definition for the cursor for WTable starts with Seq = MaxSeq+1, ie. the rows with // Seq values greater than the max Seq value of assigned words. Interval is also used here // to help make the query more efficient (a possible better alternative would be to keep // track of Seq). MaxSeq:=Max(Seq from WTable where (StrID=LCursorSqlTable.Select().ID) and (Interval=LInterval) and (LenCheck)); end; //WTable end; //LCursorSqlTable. // Here is a simple check that this batch assigned each word to a line for each ID. All // LenCheck values must be true or else there is a failure somewhere in batch. Of course we // could more thoroughly check exactly where (ID and line) failure(s) occurred in the query. // We could also put extensive error handling throughout batch. Just so you know ☺ if not All(LenCheck from WTable) then exit; // Use a query to build the output strings (ALine) by concatenating (using the Concat operator) // the words (Str) within IDs and lines (Interval) and ordering the concatenation by Seq //(the position determined by Split). We also include the line length for each // concatenated line (ALine) of words within an ID. select WTable add{' ' del} adorn {key{Seq}} group by {StrID,Interval} add{ Concat(Str,del order by {Seq}) ALine} add{Length(ALine) LineLength} order by {StrID,Interval}; // End of batch Result: StrID Interval ALine LineLength ----- -------- ----------------------------------- ---------- 1 1 BIOLOGICAL SURVEY FOUNDATION OF 31 1 2 ONTARIO HAPPY IF SOMEONE CAN HELP 33 1 3 SOLVE THIS SQL PROBLEM 22 2 1 I SURE HOPE I CAN GET THIS THING TO 35 2 2 WORK RIGHT 10 3 1 What will happen with a big string 34 3 2 full of all kinds of names and 30 3 3 addresses including some goobly 31 3 4 goop. Will it hold up? 22 Here is what WTable looks like before concatenation. Note the running sum of string lengths (RunningSumStrLens) within each line (Interval) for each ID (StrID). select WTable order by {StrID,Interval,Seq}; StrID Str Seq Interval StrLen RunningSumStrLens LenCheck ----- ---------- --- -------- ------ ----------------- -------- 1 BIOLOGICAL 0 1 10 10 True 1 SURVEY 1 1 6 17 True 1 FOUNDATION 2 1 10 28 True 1 OF 3 1 2 31 True 1 ONTARIO 4 2 7 7 True 1 HAPPY 5 2 5 13 True 1 IF 6 2 2 16 True 1 SOMEONE 7 2 7 24 True 1 CAN 8 2 3 28 True 1 HELP 9 2 4 33 True 1 SOLVE 10 3 5 5 True 1 THIS 11 3 4 10 True 1 SQL 12 3 3 14 True 1 PROBLEM 13 3 7 22 True 2 I 0 1 1 1 True . . 3 hold 23 4 4 18 True 3 up? 24 4 3 22 True How will this batch work for larger strings? Lets take the whole question of the post (cut and pasted from the op on google). Here, through dataphor, we assign the sql server table SMerge just one row of data consisting of the post data as the input string. As part of the insert we remove carrige returns and line feeds using the Replace function. SMerge:= table { row{1 ID, "I'm having a really tough time with a SQL statement and I am wondering if someone is able to help out or point me in the right direction. I have a table of names which can be very long. These names get printed on envelopes. The problem is the envelope can only hold 35 characters per line. I have to divide into 4 lines at most. So I need to separate these long varchars into segments, no longer than 35 characters but preserving whole words. So far my approach has been to take a LEFT segment, REVERSE it, find the first space with CHARINDEX and use it to calculate how many characters to take in a SUBBSTRING. Here's an example of what I have been trying. I can find the first two segments, but then it starts to get confusing. Can anyone suggest a better approach? Am I going to be able to do this in SQL? I appreciate any help." InStr} } // Replace carrige returns and line feeds with blanks in InStr. redefine {InStr:=Replace(Replace(InStr,ASCII(list(Byte){13}) ,' '),ASCII(list(Byte){10}) ,' ')} ; Executing the batch with a maximum line size of 80 (MaxLineLength:=80) we get: StrID Interval ALine LineLength ----- -------- -------------------------------------------------------------------------------- ---------- 1 1 I'm having a really tough time with a SQL statement and I am wondering if 73 1 2 someone is able to help out or point me in the right direction. I have a table 78 1 3 of names which can be very long. These names get printed on envelopes. The 74 1 4 problem is the envelope can only hold 35 characters per line. I have to divide 78 1 5 into 4 lines at most. So I need to separate these long varchars into segments, 78 1 6 no longer than 35 characters but preserving whole words. So far my approach has 79 1 7 been to take a LEFT segment, REVERSE it, find the first space with CHARINDEX and 80 1 8 use it to calculate how many characters to take in a SUBBSTRING. Here's an 74 1 9 example of what I have been trying. I can find the first two segments, but then 79 1 10 it starts to get confusing. Can anyone suggest a better approach? Am I going to 79 1 11 be able to do this in SQL? I appreciate any help. 49 Executing the batch with a maximum line size of 70 (MaxLineLength:=70) we get: StrID Interval ALine LineLength ----- -------- ---------------------------------------------------------------------- ---------- 1 1 I'm having a really tough time with a SQL statement and I am wondering 70 1 2 if someone is able to help out or point me in the right direction. I 68 1 3 have a table of names which can be very long. These names get printed 69 1 4 on envelopes. The problem is the envelope can only hold 35 characters 69 1 5 per line. I have to divide into 4 lines at most. So I need to separate 70 1 6 these long varchars into segments, no longer than 35 characters but 67 1 7 preserving whole words. So far my approach has been to take a LEFT 66 1 8 segment, REVERSE it, find the first space with CHARINDEX and use it to 70 1 9 calculate how many characters to take in a SUBBSTRING. Here's an 64 1 10 example of what I have been trying. I can find the first two segments, 70 1 11 but then it starts to get confusing. Can anyone suggest a better 64 1 12 approach? Am I going to be able to do this in SQL? I appreciate any 67 1 13 help. 5 Seems to work ok ☺ Note that with this simple framework it would be easy to add all kinds of editing niceties. Left justification, centering etc. could be easily added. It would also be easy to edit/check the individual words, possibly eliminating some given specific rules. Note that we could easily make the batch an operator (stored procedure) returning the work table or a virtual table. If you have any ideas on further features or what would be kewl for this framework please let me know ☺ You can check out dataphor at: www.dataphor.org
Subscribe to:
Posts (Atom)