In his July 2007 MS Sql Server article Itzik Ben-Gan raises the question of identifying consecutive runs of rows where values repeat themselves. T-SQL Black Belt Identifying Sections By:Itzik Ben-Gan The problem is framed as: 'The generic form of the problem involves a table (call it T1) that has two columns of interest: one representing order among rows (call it id) and the other holding some value (call it val). The task at hand is to identify sections of consecutive rows that share the same value. The terms section and consecutive rows are problematic when you're dealing with sets, but as I mentioned, the column ID represents logical order among rows, and once order is defined, these terms become meaningful. For each identified section, you need to return the minimum id, maximum id, val, count of rows in the section, and possibly other aggregates.' The use of the term 'generic' is most appropriate as it conveys the idea that a proposed solution account for 'all' data. Such a generic approach is an 'inclusive' one. Given the sample data: create table TG1 ( id int not null primary key, val varchar(10) not null ); insert into TG1(id, val) values( 1, 'a'); insert into TG1(id, val) values( 2, 'a'); insert into TG1(id, val) values( 3, 'a'); insert into TG1(id, val) values( 5, 'a'); insert into TG1(id, val) values( 7, 'b'); insert into TG1(id, val) values( 9, 'b'); insert into TG1(id, val) values(11, 'a'); insert into TG1(id, val) values(13, 'a'); insert into TG1(id, val) values(17, 'b'); insert into TG1(id, val) values(19, 'b'); insert into TG1(id, val) values(23, 'b'); insert into TG1(id, val) values(29, 'a'); insert into TG1(id, val) values(31, 'b'); insert into TG1(id, val) values(37, 'b'); The ideal solution is to create a new column which can be used with val to group the rows into discrete sections which reflects the fact that a particular val can occur repeated times. id val Grp(Dense_Rank) ---- ------ --------------- 1 a 1 2 a 1 3 a 1 5 a 1 7 b 2 9 b 2 11 a 3 13 a 3 17 b 4 19 b 4 23 b 4 29 a 5 31 b 6 37 b 6 The new column (Grp) is simply a dense rank on val in the order of id. There are two proposed approaches. One uses a subquery to create the Grp column. Various type of subqueries have been proposed as I did some time ago: Jul 5 1999, 12:00 am microsoft.public.sqlserver.programming Difficult SQL Question Author: Trysql Jun 11 2001, 10:33 am microsoft.public.sqlserver.programming SQL challenge: coelesce consecutive and overlapping bookings for a room Author: steve dassin None of the subquery solutions are particularly appealing. They are not intuitive and it is doubtful the average developer would come upon such a beast. And they are certainly not efficient as described here. As for the analytic ranking functions as explained here sql ranking functions cannot rank a column independent of its order. This makes it necessary to simulate a dense rank with a combination of functions. This too is not intuitive and it is doubtful the average developer understands this approach and can apply it to other similar problems. We are really talking expert sql programmers and not application developers. Perhaps a more fundamental issue than the types of solutions is the question posed. A generic solution is the foundation of a report. But of how much importance is the relationship between different vals. Is the intent to see a relationship between different accounts, students,sport teams or to examine data/relationships within the same team but different sections? If the latter then is it really necessary to produce sections on everything? If the question is framed as identify all the sections for a 'particular' val, as seems reasonable, the question of the types of solutions and there nature dramatically changes. Now the whole solution can be based on the very basic idea of a dense rank - if the thing is the same use the current rank, if another thing increment the rank. Any developer using any client should have no problem seeing this as nothing more than a foreach loop. Assuming val 'a' is of interest in D4 this can be done as simply: var TempTable:=table of{id:Integer,val:String,Grp:Integer}{}; var I:Integer:=1; foreach var LItem in TG1 do begin if LItem.val<>'a' then I:=I+1; if LItem.val='a' then insert row{LItem.id id,LItem.val val,I Grp} into TempTable; end; select TempTable; id val Grp -- --- --- 1 a 1 2 a 1 3 a 1 5 a 1 11 a 3 13 a 3 29 a 6 We have completely eliminated any need of subqueries or ranking functions. In fact we have eliminated any need of any type of query. We can, of course, create a procedure to return sections for any val. And number the sections consecutively in case that is helpful. A D4 procedure would be: create operator Sections_for_a_val(aTable:typeof(TG1),aVal:String): table{id:Integer,val:String,Grp:Integer} begin result:=table of typeof(result){}; var I:Integer:=1; foreach row in aTable do begin if val<>aVal then I:=I+1; if val=aVal then insert row{id id,val val,I Grp} into result; end; result:= result join ( ToTable(ToList(cursor(result over{Grp} order by {Grp}))) {Grp,sequence+1 GrpSeq} ) {id,val,GrpSeq Grp}; end; select Sections_for_a_val(TG1,'a'); id val Grp -- --- --- 1 a 1 2 a 1 3 a 1 5 a 1 11 a 2 13 a 2 29 a 3 select Sections_for_a_val(TG1,'b'); id val Grp -- --- --- 7 b 1 9 b 1 17 b 2 19 b 2 23 b 2 31 b 3 37 b 3 Summarizing is the same as in the sql solutions. select Sections_for_a_val(TG1,'a') group by {Grp} add{Min(id) start_section,Max(id) end_section,Count() num_rows}; Grp start_section end_section num_rows --- ------------- ----------- -------- 1 1 5 4 2 11 13 2 3 29 29 1 Finally, least I have neglected sql, here is yet another subquery approach (in D4) to dense ranks that takes the sql server 2005 row_number() function as its starting point. var A:= SQLQuery('select id,val,row_number()over(order by id) RowID from TG1'); select A add { Count( (A rename {RowID RowID1} where RowID1<=RowID) add{ (A adorn{key{RowID}})[RowID1-1].val = (A adorn{key{RowID}})[RowID1].val Testval} where not Testval ) + 1 DenseRank }; id val RowID DenseRank -- --- ----- --------- 1 a 1 1 2 a 2 1 3 a 3 1 5 a 4 1 7 b 5 2 9 b 6 2 11 a 7 3 13 a 8 3 17 b 9 4 19 b 10 4 23 b 11 4 29 a 12 5 31 b 13 6 37 b 14 6 There is certainly nothing 'wrong' with a report. But it is perhaps wrong-headed to try to make one when it is not really the object of the exercise. steve
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Sunday, June 24, 2007
Sql - Using a dense rank for identifying sections
Friday, June 22, 2007
Dataphor - Sql Visualizing a ranking query
The use of prior information is a fundamental basis for judging the efficiency of a system. Systems that make use of prior information are efficient systems. The most efficient systems make the best use of this information. The most inefficient systems are those that totally ignore it. This holds for small things like functions to large things like databases and application development. Many times efficiency or the lack thereof is hidden. If you could see a visualization of what is happening it should certainly help in understanding it. A classic query in sql is a query that ranks a column. A count is used to represent the rank of each column value over the table. This type of query is discussed at length regarding MS Sql Server in: Itzik Ben-Gan and Sujata Mehta OVER Clause and Ordered Calculations http://www.insidetsql.com/OVER_Clause_and_Ordered_Calculations.doc Create a table R1 in Dataphor using MS Sql Server 2005 as the data repository. create table R1 { ID:Integer, Y:Integer, key{ID} }; insert table { row{10 ID,8 Y}, row{15,26}, row{20,16}, row{25,12}, row{30,25}, row{35,42}, row{40,19}, row{45,36}, row{50,22}, row{55,39} } into R1; select R1: ID Y -- -- 10 8 15 26 20 16 25 12 30 25 35 42 40 19 45 36 50 22 55 39 A query to rank column Y has traditionally been written in two forms. The idea is to use the aggregate Count to count the number of times a comparison between a particular value of Y and all the other row values of Y is true. The comparison takes a specific row of R1 and compares the column Y <= to all the column Y values from all the rows. The count of comparisons where (a value of Y <= all other Y values) is true is the rank of the Y value. The query can be written using a subquery to obtain a count or with a join followed by a group by. An sql query using a subquery: select A.ID,A.Y, (select count(*) from R1 as B where B.Y<=A.Y) as Rank from R1 as A order by A.Y An sql query using a join: select A.ID,A.Y,Count(*) as Rank from R1 as A join R1 as B on B.Y<=A.Y group by A.ID,A.Y order by A.Y Both queries result in: ID Y Rank -- -- ---- 10 8 1 25 12 2 20 16 3 40 19 4 50 22 5 30 25 6 15 26 7 45 36 8 55 39 9 35 42 10 But both queries can hide the nature of the inequality comparison 'B.Y<=A.Y' on which the count is based. This comparison requires every row of R1 for every value of Y. In other words each count for a value of Y requires the table R1. Therefore there are N values of Y multiplied by the rows of the table, N. There N*N comparisons for N counts where N is the number of rows in the table. For table R1 it is therefore necessary for the database engine to make 100 comparisons to obtain the 10 counts and rank each column Y. To more clearly see the queries in terms of the work involved lets look at representing the work of the database engine via a Dataphor (D4) query. A table can be declared in any D4 query using the 'table' construct. Three random rows of table R1 were chosen so as to represent three random values of column Y. The query uses values of Y {39,16,12} to obtain ranks for only these values. Each table expression is the table R1 and additionally includes one of the random Y values in the list, its comparison to each of the Y column values in R1 and the boolean (true/false) result of the comparison. We use a left join since we are not accounting for all ranks (Y values). select R1 left join ( ( table { row{1 RowID,10 ID,8 Y,39 Y1,8<=39 Test,true TestY1}, row{1 RowID,15 ID,26 Y,39 Y1,26<=39 Test,true TestY1}, row{1 RowID,20 ID,16 Y,39 Y1,16<=39 Test,true TestY1}, row{1 RowID,25 ID,12 Y,39 Y1,12<=39 Test,true TestY1}, row{1 RowID,30 ID,25 Y,39 Y1,25<=39 Test,true TestY1}, row{1 RowID,35 ID,42 Y,39 Y1,42<=39 Test,false TestY1}, row{1 RowID,40 ID,19 Y,39 Y1,19<=39 Test,true TestY1}, row{1 RowID,45 ID,36 Y,39 Y1,36<=39 Test,true TestY1}, row{1 RowID,50 ID,22 Y,39 Y1,22<=39 Test,true TestY1}, row{1 RowID,55 ID,39 Y,39 Y1,39<=39 Test,true TestY1} } where TestY1 group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) union ( table { row{2 RowID,10 ID,8 Y,16 Y1,8<=16 Test,true TestY1}, row{2 RowID,15 ID,26 Y,16 Y1,26<=16 Test,false TestY1}, row{2 RowID,20 ID,16 Y,16 Y1,16<=16 Test,true TestY1}, row{2 RowID,25 ID,12 Y,16 Y1,12<=16 Test,true TestY1}, row{2 RowID,30 ID,25 Y,16 Y1,25<=16 Test,false TestY1}, row{2 RowID,35 ID,42 Y,16 Y1,42<=16 Test,false TestY1}, row{2 RowID,40 ID,19 Y,16 Y1,19<=16 Test,false TestY1}, row{2 RowID,45 ID,36 Y,16 Y1,36<=16 Test,false TestY1}, row{2 RowID,50 ID,22 Y,16 Y1,22<=16 Test,false TestY1}, row{2 RowID,55 ID,39 Y,16 Y1,39<=16 Test,false TestY1} } where TestY1 group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) union ( table { row{3 RowID,10 ID,8 Y,12 Y1,8<=12 Test,true TestY1}, row{3 RowID,15 ID,26 Y,12 Y1,26<=12 Test,false TestY1}, row{3 RowID,20 ID,16 Y,12 Y1,16<=12 Test,false TestY1}, row{3 RowID,25 ID,12 Y,12 Y1,12<=12 Test,true TestY1}, row{3 RowID,30 ID,25 Y,12 Y1,25<=12 Test,false TestY1}, row{3 RowID,35 ID,42 Y,12 Y1,42<=12 Test,false TestY1}, row{3 RowID,40 ID,19 Y,12 Y1,19<=12 Test,false TestY1}, row{3 RowID,45 ID,36 Y,12 Y1,36<=12 Test,false TestY1}, row{3 RowID,50 ID,22 Y,12 Y1,22<=12 Test,false TestY1}, row{3 RowID,55 ID,39 Y,12 Y1,39<=12 Test,false TestY1} } where TestY1 group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) ) order by {Y}; ID Y Rank -- -- ---------- 10 8 <No Value> 25 12 2 20 16 3 40 19 <No Value> 50 22 <No Value> 30 25 <No Value> 15 26 <No Value> 45 36 <No Value> 55 39 9 35 42 <No Value> Each expression: ( table{ } where TestY1 //The number of rows where TestY1 is true is the Rank. group by {Y1} add{Count() Rank} {Y1 Y,Rank} ) represents the work necessary to get a rank for each Y value. Not only is it necessary to make the comparisons within the table { } but the result of the comparison must be related back to table R1. This idea is first conveyed in the where and group by so as to reduce the count to a single row and then in the union of all rows of ranks which is then joined by Y values back to R1. Visualizing the query this way makes it easier to see that the computation of any one rank is independent of any other rank. There is not any information used in one rank that is used for the computation of another rank. In other words, there is no use of prior information from one rank to another. Conceptually this is why this type of query has be referred to as an inefficient type of query. Note that the implied basic structure of ranking queries, a table repeated for each rank desired, does not change regardless of the type of rank desired. What changes is the complexity of the comparison. Consider the following table: create table A1 { ID:Integer, Y:Integer, key{ID} } ; insert table { row{1 ID,5 Y}, row{2,7}, row{3,7}, row{4,10}, row{5,20}, row{6,20}, row{7,32} } into A1; To obtain unique ranks for Y the comparison 'each value of Y<=a particular value of Y' is inappropriate since it results in non-unique ranks. To break tied ranks the following D4 query can be used: select (A1 rename {ID ID1,Y Y1}) add {Count(A1 where ((Y<Y1) or (Y=Y1 and ID<=ID1)) ) Rank} {ID1 ID,Y1 Y,Rank} order by {Rank} ; ID Y Rank -- -- ---- 1 5 1 2 7 2 3 7 3 4 10 4 5 20 5 6 20 6 7 32 7 This query can be visualized with the same table structure as the prior example but the complexity of the comparison is increased due to bringing in the ID column. The is another negative impact to efficiency. In all the queries presented so far the only logical order is how the result will be presented (order by). The queries operate under the assumption that the database engine will pick a particular row to obtain a rank in any way it chooses. As the queries are written we don't care nor does it have any importance how the rows (ranks) are processed. Lets now introduce the idea of ordering the columns of Y by sorting the rows of table R1 in ascending order of column Y and processing the table in that order. The D4 ToList operator creates a list of rows from a table defined in the order of a cursor. The statement: select ToList(cursor(R1 order by {Y}))[0]; uses an indexer ([]) to return the 1st row in a list using table R1 ordered by column Y. ID Y -- - 10 8 The next row in the ordered list can be addressed by using 1 for the indexer: select ToList(cursor(R1 order by {Y}))[1]; ID Y -- -- 25 12 Finally we can get the last row in the list without explicitly stating an index, but using Count to count how many rows there are (lists are addressed from 0 so we subtract 1 from count): select ToList(cursor(R1 order by {Y}))[ToList(cursor(R1 order by {Y})).Count()-1]; ID Y -- -- 35 42 Now we can simply loop over the ordered list to obtain the rank of each Y value: var TempTable:=table of {Y:Integer,Rank:Integer}{}; var LListR1:=ToList(cursor(R1 order by {Y})); var I:Integer:=0; foreach var LListRow in LListR1 do begin I:=I+1;//Reuse previous value of I for rank. insert row{LListRow.Y Y,I Rank} into TempTable; end; select 'TempTable'; select TempTable; select 'R1 joined to TempTable'; select R1 join TempTable with {IgnoreUnsupported = 'true'} order by {Y}; TempTable Y Rank -- ---- 8 1 12 2 16 3 19 4 22 5 25 6 26 7 36 8 39 9 42 10 R1 joined to TempTable ID Y Rank -- -- ---- 10 8 1 25 12 2 20 16 3 40 19 4 50 22 5 30 25 6 15 26 7 45 36 8 55 39 9 35 42 10 We have obtained the ranks by processing the rows of R1 only once as compared to 100 times in a query. The statement: I:= I + 1 which represents the rank of a particular Y value, is a mathematical statement of reusing prior information. And we are able to reuse information as a result of ordering the rows of R1. We can take the concept of the loop one step further by simply transforming the list of rows into a table: select ToTable(ToList(cursor(R1 order by {Y}))) {ID,Y,sequence+1 Cnt} order by {Y}; ID Y Cnt -- -- --- 10 8 1 25 12 2 20 16 3 40 19 4 50 22 5 30 25 6 15 26 7 45 36 8 55 39 9 35 42 10 This represents a most efficient use of prior information, the previous value of Y. Sql is making the same efficient use of prior information with its analytic ranking functions: select ID,Y,row_number()over(order by Y) as Rank from R1 The paper by Ben-Gan can be seen as a plea for Sql Server to more efficiently use prior information (in the form of processing queries in an ordered way) in various contexts. Just as I:=I+1 is a statement of reuse of information, application development is surely based on the reuse of data. And what distinguishes one environment from another is the efficiency of use. Conceptually I:=I+1 is no different than using a key to efficiently access a row of data from a view. To make the connection between reusing prior information efficiently and application development is to see what D4 is all about -:) bye for now, steve This operator returns a random sample (N) of column Y values from table R1 in a table. It takes into account replacement. create operator Rand_Y_Values(N:Integer): table{RowID:Integer,Y1:Integer} begin result:=table of typeof(result){}; var Check:Integer; for var J:Integer:=1 to N do begin var Y1:Integer; Math.Seed(); for var I:Integer:= 9 to 1000 do begin Y1:=Random(1,50); Check:=I; if ( IsNotNil ((R1 adorn {key{Y}})[Y1 by {Y}]) ) and ( IsNil( (result adorn {key{Y1}})[Y1]) ) with {IgnoreUnsupported = 'true'} then break; end; if Check=1000 then begin raise Error('No random value could be found for Row #: '+ ToString(N)); exit; end; result:=result union table{row{J RowID,Y1 Y1}}; end; end; This query uses operator Rand_Y_Values to return a character string that represents a row statement using a value of Y. The rows were copied and pasted into the query. select (R1 times Rand_Y_Values(3) with {IgnoreUnsupported = 'true'}) add{Y<=Y1 TestResult} add{'row{'+ToString(RowID)+' RowID,'+ToString(ID)+' ID,'+ToString(Y)+' Y,'+ ToString(Y1)+' Y1,'+ToString(Y)+'<='+ToString(Y1)+ ' Test,'+ToString(TestResult)+' TestY1},' Arow} {Arow};
Tuesday, June 19, 2007
Dataphor - Inclusive Sql vs exclusive D4
Sql questions and solutions are proffered from the perspective of inclusive. It is generally the largest scope encapsulated in a single query that is the goal. Every row of every table must be accounted for. And even those rows that don't exist must be included. If sql is inclusive then application development must be exclusive. AD raises a singular question, a particular vendor, a particular customer. A solution for a specific something need not account for every something. Therefore it makes sense that AD have constructs and solutions that match the intent of AD. More on the inclusive vs. exclusive argument at a later date -:) Here is a solution in the application development language of Dataphor based on the thread: Monday, June 18, 2007 6:52 AM microsoft.public.sqlserver.newusers Query How To? http://tinyurl.com/yphn44 MS Sql Server 2005 is used to store the data. We frame the question in an exclusive manner: given a particular player summarize his ratings over the different sports. create table WanaPlayers { CID:Integer, Player:String {default nil}, Ethnicity:String {default nil}, key{CID} }; insert table { row{1 CID,'Charlie' Player ,'black' Ethnicity}, row{2, 'Dave', 'white'}, row{3, 'Josh', 'white'}, row{4, 'Jeff', 'black'} } into WanaPlayers; create table WanaSports { TID:Integer, Sport:String {default nil}, key{TID} }; insert table { row{1 TID, 'Baseball' Sport}, row{2, 'Football'}, row{3, 'Hockey'} } into WanaSports; create table WanaRatings { RowID:Integer, TID:Integer, CID:Integer, Rating:Integer {default nil}, key{RowID}, reference Ratings_Sports {TID} references WanaSports {TID}, reference Ratings_Players {CID} references WanaPlayers {CID} }; insert table { row{1 RowID,1 TID, 1 CID, 50 Rating}, row{2,1, 1, 25}, row{3,2, 1, 69}, row{4,2, 1, 71}, row{5,2, 1, 50}, row{6,3, 1, 50}, row{7,1, 2, 97}, row{8,1, 2, 100}, row{9,1, 2, 100}, row{10,2, 2, 98}, row{11,3, 2, 99}, row{12,1, 3, 0} } into WanaRatings; We create a view using natural joins as opposed to using any outer joins. The view simply addresses the maximum rating for each player for each sport he has. create view WanaView WanaRatings join WanaPlayers join WanaSports group by {Player,Sport} add {Max(Rating) MaxRating}; select WanaView; Player Sport MaxRating ------- -------- --------- Charlie Baseball 50 Charlie Football 71 Charlie Hockey 50 Dave Baseball 100 Dave Football 98 Dave Hockey 99 Josh Baseball 0 The key of the view is obvious, {Player,Sport}. Assuming sports baseball, football and hockey we can define an operator to return a single row with the columns of interest. For any Player we create a table with their MaxRating for each sport. We use the key of the view to directly address the max rating. create operator PlayerRating(Name:String): row{Player:String,AvgRating:Decimal,HighNumber:Integer} begin result:=row of typeof(result){}; result:= (table { row{1 ID,WanaView[Name,'BaseBall' by {Player,Sport}].MaxRating MaxRating}, row{2,WanaView[Name,'FootBall' by {Player,Sport}].MaxRating}, row{3,WanaView[Name,'Hockey' by {Player,Sport}].MaxRating} } add{ToInteger(MaxRating>80) HighRating} group add{Avg(MaxRating) AvgRating,Sum(HighRating) HighNumber} {Name Player,AvgRating,HighNumber})[]; end; select PlayerRating('Charlie'); Player AvgRating HighNumber ------- --------- ---------- Charlie 57 0 select PlayerRating('Dave'); Player AvgRating HighNumber ------ --------- ---------- Dave 99 3 select PlayerRating('Jeff'); Player AvgRating HighNumber ------ ---------- ---------- Jeff <No Value> 0 select PlayerRating('Steve'); Player AvgRating HighNumber ------ ---------- ---------- Steve <No Value> 0 Since the operator returns a row we have each column (scalar value) easily available. select PlayerRating('Dave').HighNumber; 3 Feel free to post any questions as comments. bye for now, steve
Saturday, June 16, 2007
Dataphor - Intelligent views
This is in response to the thread: Saturday, June 16, 2007 10:10 AM microsoft.public.sqlserver.programming How to make a view as input??? http://tinyurl.com/2o5q8h MS Sql Server 2005 Bol on View: 'Creates a virtual table that represents the data in one or more tables in an alternative way.' This is a good and accurate defintion. It is a 'representation' of a select statement. But it does not 'capture' the relationship(s) of the representation. It is unaware of any key(s) and most importantly unaware of any constraints between tables, ie. reference(s). It thus suffers from a deficit disorder as far as application development goes. This retardation causes developers to assume responsibilities the db should handle amongst which is the use of the nightmare t-sql trigger. Here is a very simple example of intelligent view behavior for application development. It uses the D4 language of Dataphor and uses sql server 2005 as the data respository. create table MyOrders { order_nbr:Integer, some_col:Integer, key{order_nbr} }; create table MyOrderDetails { order_nbr:Integer, sku:Integer, item_price:Integer, key{order_nbr,sku}, reference MyDetails_MyOrders{ order_nbr } references MyOrders { order_nbr } }; insert table { row{1 order_nbr, 15 some_col}, row{2,30}, row{3,14} } into MyOrders; insert table { row{1 order_nbr, 1 sku, 500 item_price}, row{1, 2, 205}, row{2, 1, 490}, row{3, 1, 480} } into MyOrderDetails ; This is a natural join. The sql "on MyOrders.order_nbr=MyOrderDetails.order_nbr" is implied by a natural join. create view MyOrdersView MyOrders join MyOrderDetails The keys of each table and the constraint (reference) between the tables are used to resolve each insert into the view. The view inherits the keys of each table and the reference between the tables. insert row{1 order_nbr,3 sku,253 item_price} into MyOrdersView; insert row{2 order_nbr,2 sku,321 item_price} into MyOrdersView; insert row{4 order_nbr,1 sku,42 some_col,321 item_price} into MyOrdersView; select MyOrdersView; order_nbr some_col sku item_price --------- -------- --- ---------- 1 15 1 500 1 15 2 205 1 15 3 253 2 30 1 490 2 30 2 321 3 14 1 480 4 42 1 321 An insert should be resolved to a delete followed by an insert from the view to each base table, ie. an update. (The "adorn" is meta-data attached to the insert allowing the update behavior to take place thru the view). insert row{1 order_nbr,3 sku,25 some_col,632 item_price} into MyOrdersView adorn with { PropagateInsert = "Ensure" }; select MyOrdersView; order_nbr some_col sku item_price --------- -------- --- ---------- 1 25 1 500 1 25 2 205 1 25 3 632 2 30 1 490 2 30 2 321 3 14 1 480 4 42 1 321 select MyOrders; order_nbr some_col --------- -------- 1 25 2 30 3 14 4 42 select MyOrderDetails; order_nbr sku item_price --------- --- ---------- 1 1 500 1 2 205 1 3 632 2 1 490 2 2 321 3 1 480 4 1 321 Before one asks the question of whether or not there is intelligence in the universe, developers should ask if there is intelligence in their application tools -:)
bye for now, steve
Dataphor - Sql: what does Update..From mean?
What is really wrong with the "UPDATE.. FROM.." syntax? This example uses MS Sql Server 2005 and the D4 language of Dataphor @ www.alphora.com The discussion is based on the thread: microsoft.public.sqlserver.programming Thursday, June 14, 2007 10:21 AM Update help http://tinyurl.com/27Z8PM Given an update with join representing a 1 to many relationship: UPDATE Orders SET < > FROM Orders INNER JOIN OrderDetails ON Orders.order_nbr = OrderDetails.order_nbr; Joe Celko argues: 'Bottom line: UPDATE FROM *can* be safely used - but only if you are FOREVER certain that no single row in the target table can EVER e joined to more than one row in the source table(s); FOREVER in the ENTIRE LIFETIME of the application; FOREVER across all programmers yet to come.' And from Sql Server 2005 Bol: 'The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic.' What is conspiciously missing from the discussion and Bol is one of the most fundamental concepts in database theory - a 'KEY' as opposed to an index. Obtaining the result of a natural join in the language of D4: select Orders join OrderDetails ; order_nbr some_col sku item_price --------- -------- --- ---------- 1 0 1 500 1 0 2 205 2 0 1 490 3 0 1 480 It should be obvious that the key of this table is {order_nbr,sku}. The example of the update in question is: UPDATE Orders SET Orders.some_col = OrderDetails.item_price FROM Orders INNER JOIN OrderDetails ON Orders.order_nbr = OrderDetails.order_nbr; which totally ignores the compound key and implicitly assumes a 1 to 1 relationalship between the tables. So instead of showing a logical flaw in using a join in Update, it simply shows an illogical update! The most logical way an update can use a 1 to N relationalship (ie a compound key) is to perform an update on a single row! This is what the ridiculous 'undefined' in Bol is hiding. An undefined update is the lack of recognition of the appropriate key on the part of the user. The flip side to the coin is the total lack of the concept of a key in Sql Server. Taken together this combination is lethal. In D4 an update that makes sense takes the form: update (Orders join OrderDetails) set { some_col :=item_price } where row{order_nbr order_nbr,sku sku}=row{2 order_nbr,1 sku}; If there was any question of writing an update assuming the existence of a key of just {order_nbr} we could test it by trying to select a row based on just that key, for example select a row using order_nbr 1: select (Orders join OrderDetails adorn {key{order_nbr}})[1 by {order_nbr}]; D4 would give the message: " Internal Index Error: 'Duplicate key violation.' ". which should tell an application developer that any update using the join only makes sense using the compound key. As in: select (SOrders join SOrderDetails)[1,2 by{order_nbr,sku}] ; order_nbr some_col sku item_price --------- -------- --- ---------- 1 0 2 205 D4 is an application development language, sql is a programming language. There is a big difference. As for the sql standard in which Celko expresses the update: UPDATE Orders SET some_col = (SELECT item_price FROM OrderDetails WHERE OrderDetails.order_nbr = Orders.order_nbr) WHERE EXISTS (SELECT * FROM OrderDetails WHERE OrderDetails.order_nbr = Orders.order_nbr); It is the SET subquery which returns the error: 'Server: Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression. The statement has been terminated.' And is nothing more than the Bol idea of an 'undefined' value. The idea of the standard as a substitute for the concept of a key is, of course, ridiculous. The answer to the question of 'which one' can be seen as rather absurd in light of a 'key'. There should never be a question of 'which one' in a RDMS as there can only be but one! -:) Interestingly, this issue is an example of searching for a question that is appropriate and makes sense given a response. It is no easy feat sometimes to make the connection -:) bye for now, steve
Comment: Anonymous said... This is nearly my first reading of anything to do with Dataphor, so excuse my ignorance. But the assumption that runs strongly through your entire article is that two tables can only be joined to each other in one and only one way. There are many application contexts where this is an overly limiting assumption! Sun Jun 17, 12:04:00 AM Thanks for browsing the post. To understand Dataphor you must realize that the intent of D4 is different than sql. While it is certainly true that just about any sql query can formulated (and many times more succintly) in D4, the goal of each system is different. The goal of sql is performance oriented retrieval of large numbers of rows. The goal of Dataphor is toward application development which is less concerned with retrieval and more concerned with 'addressing' tables. Application development usually implies a small sample of rows from table(s). The real goal is addressing and working with these relatively small samples with a logical clarity not offered by any sql system. The difference between sql and D4 can seen thru the join construct. In sql: select A.OrderID,B.ProductID,A.CustomerID,B.Quantity from Orders A join OrderDetails B on A.OrderID=B.OrderID The intent of the operation is retrieval and the only other information (working with MS Sql Server 2005) the db is concerned with are the index(s) of the associated tables. The same join operation in the context of AD has a different intent. Here we would want to address the space of the join in terms of its 'key'. In other words, to address the projection of the join for a single OrderID and ProductID. For example, create a view of join: create view AllOrdersDetails Orders join OrderDetails; The key of the projection is {OrderID,ProductID} and within the application the row of any combination can be easily obtained, for example Order 11077 and Product 77 (the [ ] in D4 is called an indexer for obvious reasons): var T:=AllOrdersDetails[11077,77 by{OrderID,ProductID}] over{OrderID,ProductID,CustomerID,Quantity} with {IgnoreUnsupported = 'true'} ; So we can easily work with any of the scalar values of the row: select T.CustomerID; select T.Quantity; It is a small sample of rows or a single row that the application is usually concerned with: var Y:=row of typeof((Orders join OrderDetails)[]){ }; Y:=(Orders join OrderDetails)[11077,77 by{OrderID,ProductID}]; select Y over{OrderID,ProductID,CustomerID,Quantity}; The D4 join is in fact restricted to only 'equi-joins'. This is to stress the importance of key inference. With only equi-joins the compiler has an easier job of correctly inferring key(s) as does the developer. For example expressing the join in terms of a 'where' restriction: select ((Orders rename A) times (OrderDetails rename B) where A.OrderID=B.OrderID) ('times' is like an sql cross join) the compiler infers the key: {A.OrderID,B.OrderID,B.ProductID} and an expression to obtain a particular Order and Product becomes: select ((Orders rename A) times (OrderDetails rename B) where A.OrderID=B.OrderID)[11077,11077,77 by{A.OrderID,B.OrderID,B.ProductID}]; Even though this is correct it lacks the 'clarity' of the join. Because sql does not have the logical concept of a key nor the addressing of an expression it can afford to lift any kind of restriction(s) for a join. This is why the sql on clause accepts non equi-joins. Sql, in fact, will take any argument(s) for on ranging from the sensible to the non-sensible. It is only concerned with the physical index for retrieval and there are no logical restrictions on the result (for example D4 will not return nor address a table with duplicate key(s) or rows). Since AD translates to expressions of integrity ie. keys, constraints and especially constraints involving tables (references) the equality operator is the primary logical operator of D4 and AD. I should also note that traditional uses of non-equi joins and the introduction of predicates in sql joins can be completely eliminated in AD. For example the use of a self-join to obtain a rank using the Count aggregate: select A.OrderID,A.ProductID,A.Quantity,Count(*) as Rank from OrderDetails A join OrderDetails B on A.OrderID=B.OrderID and A.Quantity>=B.Quantity group by A.OrderID,A.ProductID,A.Quantity order by A.OrderID,Rank desc OrderID ProductID Quantity Rank ----------- ----------- -------- ----------- 10248 11 12 3 10248 42 10 2 10248 72 5 1 10249 51 40 2 10249 14 9 1 If I want information on any quantity for a particular order I can directly address the row(s) without the need to 'retrieve' any information: var Quantity:= ToList ( cursor ( OrderDetails where OrderID=10248 order by {Quantity desc} ) ); var Cnt:=Quantity.Count(); select Cnt; select Quantity[0]; //Max Quantity select Quantity[Cnt-1];//Min Quantity 3 OrderID ProductID UnitPrice Quantity Discount ------- --------- --------- -------- -------- 10248 11 $14.00 12 0 OrderID ProductID UnitPrice Quantity Discount ------- --------- --------- -------- -------- 10248 72 $34.80 5 0 Drop by anytime and I hope you explore Dataphor. best, steve
Wednesday, June 13, 2007
Dataphor - Passing a table as a parameter
A quick peek at what passing a table as a parameter really means? This example uses the D4 language of Dataphor, MS Sql Server 2005, the Orders and OrderDetails tables from the Northwind db. Consider the following two simple sql queries which use a group by column and return the maximum value of a column for each group. select top 5 EmployeeID,Max(Freight) as MaxFrt from Orders group by EmployeeID order by MaxFrt EmployeeID MaxFrt ----------- --------------------- 6 367.6300 8 398.3600 1 544.0800 4 719.7800 9 754.2600 select top 5 ProductID,Max(UnitPrice) as MaxUP from OrderDetails group by ProductID order by MaxUP ProductID MaxUP ----------- --------------------- 33 2.5000 24 4.5000 13 6.0000 52 7.0000 54 7.4500 We can put each query in a stored procedure and return all the rows. create procedure OrdersMaxFrt as select EmployeeID,Max(Freight) as MaxFrt from Orders group by EmployeeID create procedure OrderDetailsMaxUP as select ProductID,Max(UnitPrice) as MaxUP from OrderDetails group by ProductID Given that the queries concern themselves with 2 mutually exclusive sets of facts (tables) what can be said of any similarities between them? It's clear they perform the same general operation, they both group the data by a single column and return a max of single column. So they return rows consisting of two columns. We can say (loosely speaking) each sp returns a table. What's left? Look at the columns of each query: SELECT TABLE_NAME,COLUMN_NAME,DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Orders' TABLE_NAME COLUMN_NAME DATA_TYPE ---------- ----------- --------- Orders EmployeeID int Orders Freight money Orders OrderID int SELECT TABLE_NAME,COLUMN_NAME,DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'OrderDetails' TABLE_NAME COLUMN_NAME DATA_TYPE ------------ ----------- --------- OrderDetails OrderID int OrderDetails ProductID int OrderDetails UnitPrice money Each query uses a group by column of int(eger) and a max of a money column. Each query returns a table of an int column and a money. Each table involves a column (OrderID) of int in the primary key definition. All these similarities can be seen as 'structural' similarities. And when we talk of such similarities we are talking the language of 'type' just as we would say the number 5 is an int type. But in this context we are speaking of 'table' type. We can now take all these structural ideas and create a D4 procedure that performs the group by operation on a table that is passed as a parameter. create operator GetaMax(aTable:table{aPkCol:Integer,aGroupCol:Integer,aMoneyCol:Money}): table{aGroupCol:Integer,aMax:Money} begin result:= aTable group by{aGroupCol} add {Max(aMoneyCol) aMax}; end; The operand (argument) aTable is a table defined in terms of its column name and corresponding data type: table{aPkCol:Integer,aGroupCol:Integer,aMoneyCol:Money} The procedure returns a result which is also a table: table{aGroupCol:Integer,aMax:Money} And now we can answer the question of what passing a table as a parameter really means. It just means the definition of the table in terms of its structure, its column names and their data type. By contrast, the only reference to a table in sql is by its name. The structural use of table information in sql does not exist. In other words, there is no table 'type' in sql and therefore nothing on which a table parameter could be based on. In sql there is only the literal representation of a table (its name) whereas in D4 it can be represented as a variable. It is the same difference representing the literal string 'ABC' but not being able to assign 'ABC' to a variable of type string. (Note how sql tries to overcome the lack of a table type: dynamic sql ☻ .) Ok, now to how the D4 GetaMax procedure works. The sql stored procedure exec OrdersMaxFrt becomes: select GetaMax(Orders {OrderID aPkCol,EmployeeID aGroupCol,Freight aMoneyCol}) {aGroupCol EmployeeID,aMax MaxFrt} order by {EmployeeID}; EmployeeID MaxFrt ---------- --------- 1 $544.08 2 $810.05 3 $1,007.64 4 $719.78 5 $890.78 6 $367.63 7 $830.75 8 $398.36 9 $754.26 In D4 the: 'GetaMax(Orders {OrderID aPkCol,EmployeeID aGroupCol,Freight aMoneyCol})' means execute the GetaMax operator using the Orders table but renaming the Orders columns to the table type (the columns with their data types) declared in the operator. In renaming the columns and because the columns have the same corresponding data types as declared in the operator we succeed in passing the Orders table (as a variable). So the EmployeeID (int) column is renamed to aGroupCol and Freight (money) to aMoneyCol (same for OrderID). The '{aGroupCol EmployeeID,aMax MaxFrt}' simply renames the generic columns of the operator back to the meaningful column names of the Orders table. Likewise, The sql stored procedure exec OrderDetailsMaxUP becomes: select GetaMax(OrderDetails {OrderID aPkCol,ProductID aGroupCol,UnitPrice aMoneyCol}) {aGroupCol ProductID,aMax MaxUP} order by {ProductID}; ProductID MaxUP --------- ------- 1 $18.00 2 $19.00 3 $10.00 4 $22.00 . Just as a variable defined as int can take an infinite number of values the GetaMax procedure represents the same operation on a possibly infinite number of tables. As I hope you can imagine the concept of a table type/argument has far reaching implications for application development. Finally, if you have Dataphor you can run the following two batches to compare the GetaMax operator to the sql stored procedures. (Note that the D4 operator 'SQLQuery' is used for pass-thru queries sending t-sql directly to Sql Server. Also Sql Server converts the max of a money type to decimal. To make the comparision with GetaMax it must be converted back to money). //Check Orders. var Compare_the_Tables:='The 2 tables are different'; if ( GetaMax(Orders {OrderID aPkCol,EmployeeID aGroupCol,Freight aMoneyCol}) {aGroupCol EmployeeID, aMax MaxFrt} ) = (SQLQuery('exec OrdersMaxFrt') {EmployeeID,ToMoney(MaxFrt) MaxFrt}) then Compare_the_Tables:='The 2 tables are the same'; select Compare_the_Tables; //The 2 tables are the same //Check OrderDetails. var Compare_the_Tables:='The 2 tables are different'; if ( GetaMax(OrderDetails {OrderID aPkCol,ProductID aGroupCol,UnitPrice aMoneyCol}) {aGroupCol ProductID,aMax MaxUP} ) = (SQLQuery('exec OrderDetailsMaxUP') {ProductID,ToMoney(MaxUP) MaxUP}) then Compare_the_Tables:='The 2 tables are the same'; select Compare_the_Tables; //The 2 tables are the same bye for now, steve
Thursday, June 07, 2007
Dataphor - trimming digits and letters from a string
Trimming leading zeros and trailing letters from a string. This question was raised in the thread: microsoft.public.sqlserver.programming Tuesday, June 05, 2007 2:08 PM Couple of update statements http://tinyurl.com/2n73xf This can done quite easily in Dataphors D4 language without the need of combining string functions into complicated expressions. The following D4 constructs can be used to return a string meeting the criteria: Indexer Expression http://www.alphora.com/docs/O-System.iIndexer.html IndexOfAny http://www.alphora.com/docs/O-System.IndexOfAny.html Concat (Concatenate) http://www.alphora.com/docs/O-System.Concat.html For updating using a table variable or view see: Join/Outer Join/Lookup http://www.alphora.com/docs/DDGRepresentingDatawithTablesandViews-DerivedTableVariables-Join.html http://www.alphora.com/docs/D4LGTableExpressions-OuterJoin.html Here is some sample data entered into MS Sql Server 2005 that we will access with Dataphor: create table LTLData (RowID int primary key,ProNum varchar(25) ) insert into LTLData Values (1,'00234') insert into LTLData Values (2,'030234BD') insert into LTLData Values (3,'33030234BD') insert into LTLData Values (4,'0233030234BD') insert into LTLData Values (5,'0000234BD') insert into LTLData Values (6,'233030234') insert into LTLData Values (7,'00000ABD') insert into LTLData Values (8,'BDBDBD') So for example given the string ''030234BD' we want to trim the leading '0' and the trailing 'BD' and return '30234'. An easy way to do this is transpose the string into rows of a single character and work with the rows to omit the leading zeros and trailing letters. Once the offending rows (characters) are omitted then concatenate the rows to form the new string. Here we use a table of numbers (numbers with a single column num) and the indexer to split each string into rows of a single character. A list of digits of type string (NumList1) is used with IndexOfAny to add the first occurrance of any non zero digit in the string to each row (FirstDigit). If a string has no digit from 1-9 then IndexOfAny returns -1. The 'with {IgnoreUnsupported = 'true'}' is used to suppress the message Dataphor returns that says there is no support in Sql Server for the IndexOfAny function. var NumList1:list(String):={'1','2','3','4','5','6','7','8','9'}; select ( numbers //table of digits (num) from 0-100. times //Like an sql cross join //Get the count (Cnt) of the items in the string and location of 1st digit. LTLData add{Length(ProNum.Trim()) Cnt,ProNum.IndexOfAny(NumList1) FirstDigit} with {IgnoreUnsupported = 'true'} where num<=Cnt-1 {RowID,ProNum,num Index,FirstDigit,ProNum[num] Char} ) order by {RowID,Index}; Here is a sample of the result: RowID ProNum Index FirstDigit Char ----- ------------ ----- ---------- ---- 1 00234 0 2 0 1 00234 1 2 0 1 00234 2 2 2 1 00234 3 2 3 1 00234 4 2 4 2 030234BD 0 1 0 2 030234BD 1 1 3 2 030234BD 2 1 0 2 030234BD 3 1 2 2 030234BD 4 1 3 2 030234BD 5 1 4 2 030234BD 6 1 B 2 030234BD 7 1 D . 7 00000ABD 0 -1 0 7 00000ABD 1 -1 0 7 00000ABD 2 -1 0 7 00000ABD 3 -1 0 7 00000ABD 4 -1 0 7 00000ABD 5 -1 A 7 00000ABD 6 -1 B 7 00000ABD 7 -1 D We now add a single where statement to omit leading zeros and trailing letters. var NumList1:list(String):={'1','2','3','4','5','6','7','8','9'}; select ( numbers //table of digits (num) from 0-100. times //Like a cross join //Get the count (Cnt) of the items in the string and location of 1st digit. LTLData add{Length(ProNum.Trim()) Cnt,ProNum.IndexOfAny(NumList1) FirstDigit} with {IgnoreUnsupported = 'true'} where num<=Cnt-1 {RowID,ProNum,num Index,FirstDigit,ProNum[num] Char} ) where (Index>=FirstDigit) and (FirstDigit<>-1 ) and (IsDigit(Char)) order by {RowID,Index}; RowID ProNum Index FirstDigit Char ----- ------------ ----- ---------- ---- 1 00234 2 2 2 1 00234 3 2 3 1 00234 4 2 4 2 030234BD 1 1 3 2 030234BD 2 1 0 2 030234BD 3 1 2 2 030234BD 4 1 3 2 030234BD 5 1 4 3 33030234BD 0 0 3 3 33030234BD 1 0 3 3 33030234BD 2 0 0 3 33030234BD 3 0 3 3 33030234BD 4 0 0 3 33030234BD 5 0 2 3 33030234BD 6 0 3 . 6 233030234 0 0 2 6 233030234 1 0 3 6 233030234 2 0 3 6 233030234 3 0 0 6 233030234 4 0 3 6 233030234 5 0 0 6 233030234 6 0 2 6 233030234 7 0 3 6 233030234 8 0 4 The where statement did not affect RowID 6 but completely eliminated RowIDs 7 and 8. The Concat operator can now be used to form a new string based on the rows meeting the criteria. The Index column is used in Concat to order the new string in the same way it was transposed. A delimiter (Del) is defined as an empty string ('') so when the new string is formed there are is no separation between the characters (Char). Instead of operating on the rows of LTLData, here is a select statement that trims a single string. The 'with {IgnoreUnsupported = 'true'}' is used with Concat since we have not mapped Concat to any Sql Server function. Like IndexOfAny the Dataphor server will assume responsibility for this operation. (Interested readers can search Dataphor help for 'query chunking' to see how Dataphor interacts with a device, ie Sql Server.) var NumList1 := {'1','2','3','4','5','6','7','8','9'}; var Str:='030234BD'; select Concat({Char,Del} from ( numbers where num<=Length(Str.Trim())-1 {Str StrNum,num Index,Str[num] Char,Str.IndexOfAny(NumList1) FirstDigit,'' Del} with {IgnoreUnsupported = 'true'} where (Index>=FirstDigit) and (FirstDigit<>-1 ) and (IsDigit(Char)) ) order by {Index} ) with {IgnoreUnsupported = 'true'}; Which returns '30234'. We can now use the above select to create an operator that will return a new string for each row of LTLData. create operator TrimStr(Str:String):String begin var NumList1 := {'1','2','3','4','5','6','7','8','9'}; result:= Concat({Char,Del} from ( numbers where num<=Length(Str.Trim())-1 {Str StrNum,num Index,Str[num] Char,Str.IndexOfAny(NumList1) FirstDigit,'' Del} with {IgnoreUnsupported = 'true'} where (Index>=FirstDigit) and (FirstDigit<>-1 ) and (IsDigit(Char)) ) order by {Index} ) with {IgnoreUnsupported = 'true'}; end; We now use operator TrimStr to return the correctly trimmed new string (NewStr). select LTLData add{TrimStr(ProNum) NewStr} with {IgnoreUnsupported = 'true'}; RowID ProNum NewStr ----- ------------ ---------- 1 00234 234 2 030234BD 30234 3 33030234BD 33030234 4 0233030234BD 233030234 5 0000234BD 234 6 233030234 233030234 7 00000ABD <No Value> 8 BDBDBD <No Value> The ProNum column in LTLData can now be updated using TrimStr. The 'IsNotNil()' is similar to the sql 'not IsNull()'. So only ProNum values will be updated where TrimStr return a non-nil(null) value. (Note that D4 uses the term 'nil' whereas sql uses 'null'.) The "with {ShouldSupport='false'}" tag is used to tell the Dataphor sever to take responsibility for evaluating 'IsNotNil(TrimStr(ProNum))'. There is no way for Sql Server to evaluate this expression. A "{ShouldSupport='true'}" (the default) means Sql Server should try to support the expression. Without the "with {ShouldSupport='false'}" the update will execute but Dataphor will issue a warning message. update LTLData set { ProNum:= TrimStr(ProNum) } where IsNotNil(TrimStr(ProNum) with {ShouldSupport='false'} ) ; select LTLData; RowID ProNum ----- --------- 1 234 2 30234 3 33030234 4 233030234 5 234 6 233030234 7 00000ABD 8 BDBDBD Finally, we recreate LTLData and show a query which uses Concat in a group by to form new trimmed strings for each RowID. var NumList1 := {'1','2','3','4','5','6','7','8','9'}; select numbers times LTLData add{Length(ProNum.Trim()) Cnt,ProNum.IndexOfAny(NumList1) FirstDigit} with {IgnoreUnsupported = 'true'} where num<=Cnt-1 {RowID,ProNum,num Index,FirstDigit,ProNum[num] Char,'' Del} where (Index>=FirstDigit) and (FirstDigit<>-1 ) and (IsDigit(Char)) group by {RowID} add {Max(ProNum) ProNum, Concat(Char,Del order by {RowID,Index}) NewPro} order by {RowID}; RowID ProNum NewPro ----- ------------ --------- 1 00234 234 2 030234BD 30234 3 33030234BD 33030234 4 0233030234BD 233030234 5 0000234BD 234 6 233030234 233030234 Update with a view. We can define a view and use it in a join to update ProNum with NewPro (from the view). create view LTLDataView numbers times LTLData add{Length(ProNum.Trim()) Cnt,ProNum.IndexOfAny( {'1','2','3','4','5','6','7','8','9'}) FirstDigit} with {IgnoreUnsupported = 'true'} where num<=Cnt-1 {RowID,ProNum,num Index,FirstDigit,ProNum[num] Char,'' Del} where (Index>=FirstDigit) and (FirstDigit<>-1 ) and (IsDigit(Char)) group by {RowID} add {Max(ProNum) ProNum, Concat(Char,Del order by {RowID,Index}) NewPro} ; The lookup operator is similar to a join but can be used to control just which table(s) should be the target of an update. We only wish to update LTLData and don't want to go thru the view to do it. We only want the view to supply the new string NewPro. The lookup encapsulates this idea. Note the lookup (join) is a natural join using RowID. update LTLData lookup (LTLDataView remove{ProNum} ) set { ProNum:= NewPro } ; select LTLData; RowID ProNum ----- --------- 1 234 2 30234 3 33030234 4 233030234 5 234 6 233030234 7 00000ABD 8 BDBDBD Instead of a view a table variable could be also be used. var NumList1 := {'1','2','3','4','5','6','7','8','9'}; var Y:= numbers times LTLData add{Length(ProNum.Trim()) Cnt,ProNum.IndexOfAny(NumList1) FirstDigit} with {IgnoreUnsupported = 'true'} where num<=Cnt-1 {RowID,ProNum,num Index,FirstDigit,ProNum[num] Char,'' Del} where (Index>=FirstDigit) and (FirstDigit<>-1 ) and (IsDigit(Char)) group by {RowID} add {Max(ProNum) ProNum, Concat(Char,Del order by {RowID,Index}) NewPro} ; update LTLData lookup (Y remove{ProNum} ) set { ProNum:= NewPro } ; select LTLData; RowID ProNum ----- --------- 1 234 2 30234 3 33030234 4 233030234 5 234 6 233030234 7 00000ABD 8 BDBDBD Bye for now, steve
Sunday, June 03, 2007
Dataphor - String differences operator
The following is a string operator based on a suggestion in the thread: microsoft.public.sqlserver.programming Friday, June 01, 2007 7:53 AM Compare String http://tinyurl.com/38e7zg It is written in D4 using Dataphor from: www.alphora.com (Go to operator Cstr). Given a comma delimited string in the form of: ' string1 , string2 ' the the operator returns a row with the following columns: removed - the difference between string1 and string2 (string1 diff-> string2). It is the strings that remain in string1 after removing any string also found in string2. added - is the same as removed but in the opposite direction, ie. the difference between string2 diff-> string1. IsRem - a boolean. True if there is a difference in the direction of removal, false otherwise. IsAdd - boolean, same as IsRem but from the direction of added. IsSame - a boolean, true if there are no differences from either direction and the order of individual string parts are the same in string1 and string2. If the string parts in each string are equal and they occur in the same position they are the same otherwise they are not. This scalar is based on an observation by Gert-Jan Strik in the thread. In the event that there is no difference the operator returns the string for the appropriate direction. If there's no difference for removed string1 is returned cleanly formatted, ie a blank between each string part. The same for added. Whether or not a difference exists can be determined by checking IsRem and IsAdd. The same formatting extends for differences. Removed and added are always formatted with a blank between each string part. Note that it is assumed that a blank is the delimiter within string1 and string2. This can easily be changed as the 'Split' operator can take a list of delimiters. Also note how nicely the D4 'without' construct lends itself to obtaining the difference in either direction between the strings. Lastly, instead of passing in a delimited string a list could have been used. In this case it does not appear to make much difference which one is used. The operator returns a row type. No declarative programming is used. Only dedicated list, string and relational operators are used. While a bit long winded compared to a dedicated D4 declarative approach, it illustrates the relative ease with which the expressive power of the language can be used. *********************************************************** Examples: 1. This is the example from the thread. It shows multiple differences in each direction. select Cstr('The red fox is big and fat. The,The green frog is slimey. The g'); This returns the following row: removed added IsRem IsAdd IsSame -------------------- -------------------- ----- ----- ------ red fox big and fat. green frog slimey. g True True False 2. A single difference in each direction. select Cstr(' B1 C1 A1 , D1 B1 A1'); removed added IsRem IsAdd IsSame ------- ----- ----- ----- ------ C1 D1 True True False 3. select Cstr(' white red blue black , red yellow pink '); removed added IsRem IsAdd IsSame ---------------- ----------- ----- ----- ------ white blue black yellow pink True True False 4. select Cstr(' white red blue black , white red blue black pink '); Since there is no difference from removal (string1) string1 is returned and IsRem is False. There is a single difference from added and IsAdd is True. removed added IsRem IsAdd IsSame -------------------- ----- ----- ----- ------ white red blue black pink False True False 5. select Cstr(' white red blue black pink , white red blue black '); Same as the last example but the difference is reversed. removed added IsRem IsAdd IsSame ------- -------------------- ----- ----- ------ pink white red blue black True False False 6. select Cstr(' white red blue black pink , white red blue black pink'); There is no difference in either direction and string1 and string2 are returned. Since the string parts (colors) are the same and occur in the same order IsSame is True. removed added IsRem IsAdd IsSame ------------------------- ------------------------- ----- ----- ------ white red blue black pink white red blue black pink False False True 7. select Cstr(' white red blue black pink , black pink blue white red'); The colors in each string match but occur in a different order. Therefore IsSame is False removed added IsRem IsAdd IsSame ------------------------- ------------------------- ----- ----- ------ white red blue black pink black pink blue white red False False False 8. This returns only the scalar removed from the row. select Cstr(' B1 C1 A1 , D1 B1 A1').removed ; C1 9. This returns it as the column Rem. select TableDee add {Cstr(' B1 C1 A1 , D1 B1 A1').removed Rem} ; Rem --- C1 10. Assigning the result to a variable. Note that the argument (operand) to the operator can be constructed within the call. var Rem:String; var Str1:=' working with variables in Dataphor '; var Str2:=' working with variables'; var Str:=Cstr(Str1+','+Str2); if Str.IsRem then Rem:=Str.removed else Rem:=' '; select Rem; in Dataphor 11. The differences can be used to add a column to the result showing the string parts in common. var Str1:='BLACK WHITE PURPLE GREEN ORANGE YELLOW'; var Str2:='WHITE BLACK PINK NAVY PURPLE YELLOW GOLD BEIGE'; select Cstr(Str1+','+Str2) add { Concat({Str,Del} from ( (ToTable((Str1+','+Str2).Split({',',' '}),'Str') {Str.Trim() Str} where Str>' ') without ToTable( ((Cstr(Str1+','+Str2) {removed+' '+added Str}).Str).Split({' '}),'Str') ) add {' ' Del} order by {Str}) commonstr } {removed,added,commonstr,IsRem,IsAdd}; removed added commonstr IsRem IsAdd ------------ -------------------- ------------------------- ----- ----- GREEN ORANGE PINK NAVY GOLD BEIGE BLACK PURPLE WHITE YELLOW True True 12. Define a table variable to hold multiple rows of the operator. var TableStr:= table { Cstr(' A1 b1 C1 d1 , A1 B1 E1 F2 ') add{1 RowID}, Cstr(' A B C , ') add{2 RowID}, Cstr('AA A AA a , AA a')add{3 RowID}, Cstr(' , ') add {4 RowID}, Cstr(' 10 8 7 6 , 10 8 7 6 ') add{5 RowID}, Cstr(' z 1 44_A , 1 44_A z ') add{6 RowID} } {RowID,removed,IsRem,added,IsAdd,IsSame} ; select TableStr order by {RowID}; RowID removed IsRem added IsAdd IsSame ----- -------- ----- -------- ----- ---------- 1 b1 C1 d1 True B1 E1 F2 True False 2 A B C False False False 3 A True AA a False False 4 False False <No Value> 5 10 8 7 6 False 10 8 7 6 False True 6 z 1 44_A False 1 44_A z False False 13. Accessing data in MS Sql Server 2005. Create some data in the pubs database and access it with Dataphor. create table Tstrings (RID int primary key,Astr varchar(150)) insert Tstrings values(1,'AA BB CC DD TT, YY NN AA BB R1') insert Tstrings values(2,'FF1 4564 33AAA T1 55, 33AAA FF1 33AA Y4564 ') insert Tstrings values(3,'100 81 1 353 30 A64,1 81 82 R34 353 A353') insert Tstrings values(4,'The red fox is big and fat. The,The green frog is slimey. The g') insert Tstrings values(5,'This is a fun test, This is a fun test ') insert Tstrings values(6,'This is a fun test, This test is a fun ') insert Tstrings values(7,'Does this stuff make sense? -:), ') insert Tstrings values(8,'I hope you,I hope you explore Dataphor ') Reconcile(Name("Pubs"));//Make the data available to Dataphor. select Tstrings add { Cstr(Astr).removed removed,Cstr(Astr).added added, Cstr(Astr).IsRem IsRem,Cstr(Astr).IsAdd IsAdd, Cstr(Astr).IsSame IsSame } with {IgnoreUnsupported='true'} remove{Astr}; //So the row doesn't get too long-:) RID removed added IsRem IsAdd IsSame --- ------------------------------- -------------------- ----- ----- ------ 1 CC DD TT YY NN R1 True True False 2 4564 T1 55 33AA Y4564 True True False 3 100 30 A64 82 R34 A353 True True False 4 red fox big and fat. green frog slimey. g True True False 5 This is a fun test This is a fun test False False True 6 This is a fun test This test is a fun False False False 7 Does this stuff make sense? -:) False False False 8 I hope you explore Dataphor False True False End of examples. *********************************************************** In sql (MS Sql Server) a procedure can have multiple output parameters. In D4 there is no such concept as multiple output parameter. Multiple output parameters in D4 means either multiple operators each returning a scalar value, returning a row or returning a table with a single row. create operator Cstr(Str:String): row{removed:String,IsRem:Boolean,IsSame:Boolean,added:String,IsAdd:Boolean} begin result:=row of typeof(result){}; var ListStr:=Str.Split({','}); //Check that input operand Str has a single comma, raise error and exit if false. var StrCnt:=ListStr.Count(); if StrCnt<>2 then begin raise Error ("Input Str: "+ Unicode({39})+ Trim(Str) + Unicode({39}) + " has " + ToString(StrCnt-1) + " comma's, the operator expects a single comma delimited string "); exit; end; var Idummy:Integer; if ListStr.Count()=1 then Idummy:=ListStr.Add(' '); //Form row (all scalar values) and modify row values when appropriate. result:= row { //removed ( IfNil(Concat ({Str,Del} from ( ( (ToTable(ListStr[0].Split({' '}),'Str') {Str.Trim() Str} where Str>' ' ) without (ToTable(ListStr[1].Split({' '}),'Str') {Str.Trim() Str} where Str>' ') ) join ((ToTable(ListStr[0].Split({' '}),'Str') {Str.Trim() Str,sequence seq} where Str>' ') add{' ' Del}) ) order by {seq} ),ListStr[0]) ) removed, //added ( IfNil(Concat ({Str,Del} from ( ( (ToTable(ListStr[1].Split({' '}),'Str') {Str.Trim() Str} where Str>' ') without (ToTable(ListStr[0].Split({' '}),'Str') {Str.Trim() Str} where Str>' ') ) join ((ToTable(ListStr[1].Split({' '}),'Str') {Str.Trim() Str,sequence seq} where Str>' ') add{' ' Del}) ) order by {seq} ),ListStr[1]) ) added, true IsRem, true IsAdd, false IsSame }; //Check removed. if (result.removed=ListStr[0]) or ( ( ToTable((result.removed).Split({' '}),'Str') {Str.Trim() Str} where Str>' ' ) = ( ToTable(ListStr[0].Split({' '}),'Str') {Str.Trim() Str} where Str>' ' ) ) then update result set { IsRem:=false, removed:=IfNil(Concat( {Str,Del} from ( ToTable(ListStr[0].Split({' '}),'Str') where Str>' ' add{' ' Del} ) order by {sequence} ),' ') }; //Check added. if (result.added=ListStr[1]) or ( ( ToTable((result.added).Split({' '}),'Str') {Str.Trim() Str} where Str>' ' ) = ( ToTable(ListStr[1].Split({' '}),'Str') {Str.Trim() Str} where Str>' ' ) ) then update result set { IsAdd:=false, added:=IfNil(Concat( {Str,Del} from ( ToTable(ListStr[1].Split({' '}),'Str') where Str>' ' add{' ' Del} ) order by {sequence} ),' ') }; //If IsRem is false, are the 2 strings the same. if not(result.IsRem) then if ( (ToTable(ToList(cursor(ToTable(ListStr[0].Split({' '}),'Str') {Str.Trim() Str,sequence seq} where Str>'' order by {seq}))) remove {seq}) = (ToTable(ToList(cursor(ToTable(ListStr[1].Split({' '}),'Str') {Str.Trim() Str,sequence seq} where Str>'' order by {seq}))) remove {seq}) ) then update result set {IsSame:=true}; //We cannot logically say 2 blank strings are the same, can we? if Length(Replace(Replace(Str,',',''),' ',''))=0 then update result set {IsSame:=nil}; end;
Subscribe to:
Posts (Atom)
Blog Archive
-
▼
2007
(29)
-
▼
June
(8)
- Sql - Using a dense rank for identifying sections
- Dataphor - Sql Visualizing a ranking query
- Dataphor - Inclusive Sql vs exclusive D4
- Dataphor - Intelligent views
- Dataphor - Sql: what does Update..From mean?
- Dataphor - Passing a table as a parameter
- Dataphor - trimming digits and letters from a string
- Dataphor - String differences operator
-
▼
June
(8)