If your not familiar with relational division a popular article for the translation of the concept into sql is: Relational Division by Joe Celko Basically relational division is the question of whether a set of predicates exist in a given table. Generally the predicates represent a set of 'rows' and therefore the question becomes does a predicate in the form of a 'table' exist in another table. The predicate rows are referred to as the divisor table, the given table as the dividend table and the result as the quotient table. Hence the idea of division by tables. Since sql does not support a table 'type' various solutions have been offered to target the rows of the given (dividend) table. Most often the means of targeting rows are the use of aggregate functions like count. Because sql cannot directly compare one set of rows (ie. a table) to another set of rows (a table) the forced simulation is not unlike the attempt to simulate a list with the substring function as explained here. The resulting sql solutions can at the very least be unintuitive and convoluted and at the very most not even possible to specify (at least for most human beings ☺. Given that Dataphor supports direct table comparisons the logic of a relational division problem can be greatly simplified. This example uses MS Sql Server 2005 and is based on the post: Thursday, July 12, 2007 11:07 PM microsoft.public.sqlserver.programming Expert Challenge - Complex Join, Group By, Having Here is some sample data. This is the dividend table. create table CTable { ID:Integer, Answer:String, Wcnt:Integer, Word:String, key{ID} }; insert table { row{1 ID,'first' Answer, 2 Wcnt,'george' Word}, row{2,'first', 2, 'burns'}, row{3,'second',2, 'burns'}, row{4,'second',2, 'burns'}, row{5,'third', 3, 'go'}, row{6,'third', 3,'george'}, row{7,'third', 3, 'burns'}, row{8,'fourth',2, 'fred'}, row{9,'fourth',2,'george'} } into CTable; Given a match (divisor) table: (test #1) declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'george' what the op refers to as a 'match' table, the relational division question becomes find the rows in the dividend table that correspond to the divisor rows. And it follows that we compare the Word and ID columns of the tables. We now make two important observations: 1. The Answer column of the dividend table defines rows that belong together. These rows form a table. 2. Since we are interesting in comparing columns of two tables we note that it is unfeasible to try to compare the ID columns of the dividend and divisor tables since the ID column of the divisor table can take any value based on the Identity function. With a little insight we see that we can compare them if we first 'standardize' the ID column in both tables. We can do this by computing the 'rank' of Words in each table and compare the ranks instead of IDs. We also note that the way the original problem was posed the Word column is of primary significance. In other words, it is the presence of the divisor Words that determines a match. There is no significance attached to ID, we are simply standarizing it to include it in a comparison. Because it has no meaning as to determining a match which ever way we standarize as long as we are consistent within both tables it will be sufficient. Sql solution Here is one of many sql solutions. The row_number() function is used to standarize the ID columns in both tables and Rank is used in the comparison. -- Match/divisor table (test#1). declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'george' select C.ID,C.Answer,C.Wcnt,C.Word from CTable as C where C.Answer in ( select B.Answer from -- Note that if we order by 'Word desc' for the ranks in both selects it -- will not make a difference, ie. (consistency for ranks between tables). (select ID,Word,row_number()over(order by Word) Rank from @match as M) as A right join (select ID,Answer,Wcnt,Word, row_number()over(partition by Answer order by Word) Rank from CTable) as B on A.Rank=B.Rank and A.Word=B.Word group by B.Answer having (count(A.rank)=count(*)) and (count(*)=(select count(*) from @match)) ) ID Answer Wcnt Word ----------- -------------------- ----------- -------------------- 1 first 2 george 2 first 2 burns Note that the join is necessary but not sufficient. The rows of Answer must be further restricted by aggregate comparisons using the count aggregate. This methodology rests on being able to express a quotient table using aggregate functions. But as the criteria for such a table becomes increasingly more complex so does the query. It is interesting that proponents of this methodology think it is quite helpful in understanding relation divison. For example the authors of: A Simpler (and Better) SQL Approach to Relational Division state 'We believe the syntactical construction of Q0 allows the student to grasp the concepts of implementing SQL division in a more intuitive way'. Perhaps most telling is their framing of the question of relational division where they state: 'A common type of database query requires one to find all tuples of some table that are related to each and every one of the tuples of a second group.' This seems to a reflection of the fact that there is no direct way to compare tables in sql. What would appear much more to the point is simply: 'A common type of database query is to compare two tables'. And this is precisely what we will do in the D4 language of Dataphor. Developing a D4 solution Keep in mind the very precise and simple definition of table equality: Two tables are equal if they have the same named columns of the same data types and they have the same set of rows. In other words, they are the same 'type' and have the same rows. The following simple D4 query illustrates how easy relational division can be. The query ignores ID and only compares tables based on Word. It uses match test#1: select //Each row in CTable that has the particular Answer value given by the where //predicate being true will be returned by the having operator. CTable having ( //Every Answer value forms a (dividend) table. We get the Answer value //by comparing the match (divisor) table to each table formed by a //particular Answer. CTable {Answer AnswerI} //We compare two tables, each row in the match table must exist //in the table formed by an Answer value. where (table{row{'burns' Word},row{'george'}}) //match table. = ((CTable where Answer=AnswerI with { ShouldSupport="false"}) {Word} ) {AnswerI Answer } ); ID Answer Wcnt Word -- ------ ---- ------ 1 first 2 george 2 first 2 burns The above query will work for matches #1 and #3 but will fail for #2. select CTable having ( CTable {Answer AnswerI} where //Using test #2. (table{row{'burns' Word},row{'burns'}}) = ((CTable where Answer=AnswerI with { ShouldSupport="false"}) {Word} ) {AnswerI Answer } ); -- Internal Index Error: "Duplicate key violation." Which makes the point that ID is necessary given duplicate Words. Now lets get serious with how we're going to do this example in D4 ☺. First we're going to create a view in D4 using an sql pass-thru query with row_number(). This will give us a rank (from 1 to Wcnt) in the direction of Word for each Answer (the partition column). create view CTableViewWord SQLQuery('select ID,Answer,Wcnt,Word, row_number()over(partition by Answer order by Word) Rank from CTable','key{Answer,Rank}') {ID,ToInteger(Rank) Rank,Answer,Wcnt,Word}; select CTableViewWord join (CTable group by {Answer} add{Min(ID) Asort}) with {IgnoreUnsupported = 'true'} return 50 by {Asort,Rank} remove{Asort}; ID Rank Answer Wcnt Word -- ---- ------ ---- ------ 2 1 first 2 burns 1 2 first 2 george 3 1 second 2 burns 4 2 second 2 burns 7 1 third 3 burns 6 2 third 3 george 5 3 third 3 go 8 1 fourth 2 fred 9 2 fourth 2 george D4 operator Words is the sql query expressed in a relational way. It too standarizes ID by using a rank for comparison. Like the sql query it only finds tables where there is the same occurrance of the Words. The values of ID are not pertinent to the intent of the problem. The intent is only to find the same Words ignoring the IDs. The operator takes a match table as an argument and returns matching tables from CTable based on Answers. //Matches with Word occurrence only. create operator Words(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; //Create a table variable with a rank in the direction of Word. This is similar //to sql row_number() function. var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {Word}))) {sequence+1 Rank,Word}) ; var RTable:= CTable having ( (CTable {Answer AnswerI}) where MatchTableRank = ( (CTableViewWord where Answer=AnswerI) {Rank,Word} ) with {IgnoreUnsupported = 'true'} {AnswerI Answer} ); //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; To clarify just what operator Words is doing lets insert some more data into table CTable. insert table { row{10 ID,'fifth' Answer,3 Wcnt,'burns' Word}, row{11,'fifth', 3, 'burns'}, row{12,'fifth', 3, 'burns'}, row{13,'sixth',4, 'arthur'}, row{14,'sixth',4, 'arthur'}, row{15,'sixth',4, 'burns'}, row{16,'sixth',4, 'burns'}, row{17,'seventh',4, 'burns'}, row{18,'seventh',4, 'burns'}, row{19,'seventh',4, 'arthur'}, row{20,'seventh',4, 'arthur'}, row{21,'eight',4, 'arthur'}, row{22,'eight',4, 'burns'}, row{23,'eight',4, 'arthur'}, row{24,'eight',4, 'burns'}, row{25,'nineth',4, 'burns'}, row{26,'nineth',4, 'burns'}, row{27,'nineth',4, 'burns'}, row{28,'nineth',4, 'arthur'}, row{29,'tenth',4, 'burns'}, row{32,'tenth',4, 'burns'}, row{35,'tenth',4, 'burns'}, row{38,'tenth',4, 'arthur'}, row{39,'eleventh',4, 'burns'}, row{41,'eleventh',4, 'burns'}, row{43,'eleventh',4, 'burns'}, row{45,'eleventh',4,'arthur'} } into CTable; Examples: select Words(table{row{11 ID,'burns' Word},row{22,'burns'}}); ID Answer Wcnt Word -- ------ ---- ----- 3 second 2 burns 4 second 2 burns select Words(table{row{11 ID,'burns' Word},row{12,'burns'},row{17,'burns'}}); ID Answer Wcnt Word -- ------ ---- ----- 10 fifth 3 burns 11 fifth 3 burns 12 fifth 3 burns The following three selects all return the same table since table equality is based on the occurrence of the three Words. The value of ID and the sequence of Words in the match table is immaterial since the comparison of Rank between the divisor (match) and dividend tables is based on Word. select Words(table{row{11 ID,'burns' Word},row{12,'george'},row{17,'go'}}); select Words(table{row{11 ID,'go' Word},row{12,'george'},row{17,'burns'}}); select Words(table{row{12 ID,'george' Word},row{13,'go'},row{14,'burns'}}); ID Answer Wcnt Word -- ------ ---- ------ 5 third 3 go 6 third 3 george 7 third 3 burns These two selects return 3 Answers and conceptually 3 tables because the two Words, 'arthur' and 'burns', occur twice within each Answer. select Words(table{row{11 ID,'arthur' Word},row{17,'burns'},row{20,'arthur'},row{21,'burns'}}); select Words(table{row{11 ID,'burns' Word},row{17,'arthur'},row{20,'arthur'},row{21,'burns'}}); ID Answer Wcnt Word -- ------- ---- ------ 13 sixth 4 arthur 14 sixth 4 arthur 15 sixth 4 burns 16 sixth 4 burns 17 seventh 4 burns 18 seventh 4 burns 19 seventh 4 arthur 20 seventh 4 arthur 21 eight 4 arthur 22 eight 4 burns 23 eight 4 arthur 24 eight 4 burns The idea of using the aggregate count in sql as a solution to relational division problems is well suited to finding occurrences of things, ie. Words. With D4 it is obviously not necessary to 'invent' an idea like count. With table comparisons it is not necessary to go beyond the 'data' in the rows. Up till now the ID value has not been meaningful. Lets change that. Here we want to match the occurrence of Word and the ID column. Said another way, we want to match the sequence of Words within an Answer. We create another view where the rank is now based on ID. create view CTableViewID SQLQuery('select ID,Answer,Wcnt,Word, row_number()over(partition by Answer order by ID) Rank from CTable','key{Answer,Rank}') {ID,ToInteger(Rank) Rank,Answer,Wcnt,Word}; select CTableViewID join (CTable group by {Answer} add{Min(ID) Asort}) with {IgnoreUnsupported = 'true'} where Answer in ({'first','second','third','fourth','sixth','eight'}) return 50 by {Asort,Rank} remove{Asort}; ID Rank Answer Wcnt Word -- ---- ------ ---- ------ 1 1 first 2 george 2 2 first 2 burns 3 1 second 2 burns 4 2 second 2 burns 5 1 third 3 go 6 2 third 3 george 7 3 third 3 burns 8 1 fourth 2 fred 9 2 fourth 2 george 13 1 sixth 4 arthur 14 2 sixth 4 arthur 15 3 sixth 4 burns 16 4 sixth 4 burns 21 1 eight 4 arthur 22 2 eight 4 burns 23 3 eight 4 arthur 24 4 eight 4 burns Operator IDandWord is similar to operator Words but here Rank reflects a meaningful sequence. Now the order of the match table is relevant to the equality of the table comparison. create operator IDandWord(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; //Create a table variable with a rank in the direction of ID. This is similar //to sql row_number() function. var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {ID}))) {sequence+1 Rank,Word}) ; var RTable:= CTable having ( (CTable {Answer AnswerI}) where MatchTableRank = ( (CTableViewID where Answer=AnswerI) {Rank,Word} ) with {IgnoreUnsupported = 'true'} {AnswerI Answer} ); //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; Here the sequence of the ID/Word combination in the match table, when standarized, is the same as the sequence in CTable so we have a match. select IDandWord(table{row{11 ID,'go' Word},row{12,'george'},row{17,'burns'}}); ID Answer Wcnt Word -- ------ ---- ------ 5 third 3 go 6 third 3 george 7 third 3 burns But these sequences of ID/Word do not match any sequence in CTable. select IDandWord(table{row{11 ID,'burns' Word},row{12,'george'},row{17,'go'}}); select IDandWord(table{row{12 ID,'george' Word},row{13,'go'},row{14,'burns'}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value> select IDandWord(table{row{11 ID,'arthur' Word},row{17,'burns'},row{20,'arthur'},row{21,'burns'}}); ID Answer Wcnt Word -- ------ ---- ------ 21 eight 4 arthur 22 eight 4 burns 23 eight 4 arthur 24 eight 4 burns No match: select IDandWord(table{row{11 ID,'burns' Word},row{17,'arthur'},row{20,'arthur'},row{21,'burns'}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value> Note that operators Words and IDandWord are the same except for the view being used, the type of standarizing on the ID column. The sql solution implies that aggregates would have to test for the sequence of IDs given the Word values. This seems to mean that sql has to use a separate set of constructs for each meaningful column involved in the match. And this certainly raises the complexity level of such a query. Now lets raise the bar one more notch in matching. Now not only the sequence of Words is meaningful (Words and IDs) but so is the relationship between the IDs. Given the three Answers that match the sequence of Words: select IDandWord(table{row{11 ID,'burns' Word},row{17,'burns'},row{20,'burns'},row{21,'arthur'}}); ID Answer Wcnt Word -- -------- ---- ------ 25 nineth 4 burns 26 nineth 4 burns 27 nineth 4 burns 28 nineth 4 arthur 29 tenth 4 burns 32 tenth 4 burns 35 tenth 4 burns 38 tenth 4 arthur 39 eleventh 4 burns 41 eleventh 4 burns 43 eleventh 4 burns 45 eleventh 4 arthur We want to be able to distinguish (match) each of these Answer tables. To do this we have to examine the relationship between ID values. The only way to target each Answer is to test the difference between the IDs. So we have to test for differences of 1,2 or 3. Operator BetweenIDs is similar to operator IDandWord but adds a column of the difference between consecutive IDs. The table comparison now involves looking at three columns, Word, Rank and TestID. create operator BetweenIDs(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {ID}))) {ID,sequence+1 Rank,Word}) ; //Compare the IDs (next to current) in sequence (ascending order) of IDs. //An indexer ([]) expression is used to directly access a particular Rank. var MatchTableID:= MatchTableRank add{IfNil(((MatchTableRank adorn{key{Rank}})[Rank+1].ID- (MatchTableRank adorn{key{Rank}})[Rank].ID),0) TestID} {Rank,Word,TestID} ; var RTable:= CTable having ( (CTable {Answer AnswerI}) where MatchTableID = ( (CTableViewID where Answer=AnswerI) add{IfNil((CTableViewID[AnswerI,Rank+1].ID- CTableViewID[Answer,Rank].ID),0) TestID} {Rank,Word,TestID} ) with {IgnoreUnsupported = 'true'} {AnswerI Answer} ); var Resultable:=table of typeof(CTable){}; //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; Match consecutivec IDs. select BetweenIDs(table{row{10 ID,'burns' Word},row{11,'burns'},row{12,'burns'},row{13,'arthur'}}); ID Answer Wcnt Word -- ------ ---- ------ 25 nineth 4 burns 26 nineth 4 burns 27 nineth 4 burns 28 nineth 4 arthur Match IDs separated by 2. select BetweenIDs(table{row{1 ID,'burns' Word},row{3,'burns'},row{5,'burns'},row{7,'arthur'}}); ID Answer Wcnt Word -- -------- ---- ------ 39 eleventh 4 burns 41 eleventh 4 burns 43 eleventh 4 burns 45 eleventh 4 arthur Match IDs separated by 3. select BetweenIDs(table{row{4 ID,'burns' Word},row{7,'burns'},row{10,'burns'},row{13,'arthur'}}); ID Answer Wcnt Word -- ------ ---- ------ 29 tenth 4 burns 32 tenth 4 burns 35 tenth 4 burns 38 tenth 4 arthur There is no match for these divisor tables: select BetweenIDs(table{row{10 ID,'burns' Word},row{11,'burns'},row{12,'burns'},row{14,'arthur'}}); select BetweenIDs(table{row{2 ID,'burns' Word},row{3,'burns'},row{5,'burns'},row{7,'arthur'}}); select BetweenIDs(table{row{14 ID,'burns' Word},row{11,'burns'},row{8,'burns'},row{5,'arthur'}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value> Hopefully you can see how D4 can take the mystery out of relational division. bye for now, steve For those who like to explore D4 here is a bonus operator ☺. Operator WordsLimit is similar to operator Words but limits the possible Answer values used for comparing the divisor and dividend tables. //This operator limits possible Answers for the table comparison. create operator WordsLimit(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; //Get the Word/ID combination corresponding to the last Rank (when ording by Word) //for each Answer. var LastRow:= CTableViewWord having ( CTableViewWord group by {Answer} add{Max(Rank) Rank} ) {ID,Answer,Wcnt,Word,Rank}; var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {Word}))) {sequence+1 Rank,Word}) ; //Get Last row of Match table var LRow:= ((MatchTableRank adorn{key{Rank}}) return 1 by {Rank desc})[]; //PTable are Answer(s) from CTable where the last row of an Answer is also //the last row of the MatchTable, ie the Rank is the same and the Word is //the same as the last row in the MatchTable. It returns only those Answer(s) //that 'could' be true in a table comparison using all the rows of MatchTable //and CTable. It limits the possible Answer(s) to only those that could be //true. var PTable:= CTable {Answer} where IsNotNil ( (LastRow adorn{key{Answer,Rank,Word}})[Answer,LRow.Rank,LRow.Word by{Answer,Rank,Word}] ) with {IgnoreUnsupported = 'true'} ; //Now using PTable which reduces number of table comparisons. //Note that the entire 'having' relation is derived independently of CTable. //It must be derived first so the rows of CTable can be compared to the 'having' //relation. var RTable:= CTable having ( (PTable {Answer AnswerI}) where MatchTableRank = ( (CTableViewWord where Answer=AnswerI) {Rank,Word} ) {AnswerI Answer} ) with {IgnoreUnsupported = 'true'} ; var Resultable:=table of typeof(CTable){}; //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; select WordsLimit(table{row{10 ID,'burns' Word},row{13,'burns'},row{16,'burns'},row{19,'arthur'}}); ID Answer Wcnt Word -- -------- ---- ------ 25 nineth 4 burns 26 nineth 4 burns 27 nineth 4 burns 28 nineth 4 arthur 29 tenth 4 burns 32 tenth 4 burns 35 tenth 4 burns 38 tenth 4 arthur 39 eleventh 4 burns 41 eleventh 4 burns 43 eleventh 4 burns 45 eleventh 4 arthur select WordsLimit(table{row{10 ID,'arthur' Word}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value>
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Tuesday, July 17, 2007
Dataphor - Simplifying Relational Division
Friday, July 13, 2007
Dataphor - An example of rapid application development
The SqlWindow example involves populating tables thru a form that is derived by the Dataphor frontend server. The fact that the form is derived by the system services is a key concept in Dataphors ability to be a vehicle of rapid application development. The example covers a wide range of Dataphor capabilities that are used in application development. Concepts covered include the presentation layer, metadata, forms, event handlers, references, constraints, cursors and views as well as many other programming aspects of D4. What does it do Based on user input several tables are populated and can be viewed thru a form. Complete editing is available. The example uses Sql Server 2005 as the data respository (device). Requirements The application was developed using Dataphor v2677 and MS Sql Server 2005 sp1. Except for an sql stored procedure (which is not required) the example should run on any version of Sql Server or any other database for that matter. Download the demo Everything needed to run the demo including screenshots and setup scripts for tables, views, constraints, references, operators etc. can be download @ http://www.geocities.com/d4tosql/D4/sqlwindowexample.zip All scripts for Dataphor objects are documented with comments that should help explain the what and why of the code. Screenshots You can view screenshots of the demo here and here. View a D4 operator used here. What is the context of the example Two different methods are used to compute cumulative aggregates that are based on the concept of an sql window as outlined in the Sql-99 standard. The sql window is intended to simplify and make more efficient the computing of cumulative sums. While an sql window is available in DB2 and Oracle, it is not in Sql Server. The two methods used in the example simulate two different conceptual approaches to deriving cumulative/running sums. Who is this intended for For anyone interested in exploring in greater detail key programming aspects of Dataphor used in application development. There are many operators and other Dataphor objects to go thru. The code is relatively straightforward and easy to follow. The example can also be used as a demo for those not familiar with Dataphor. The setup script will do everything required to run the example. It is only necessary to have a library using Sql Server as the device. The form in this example is 'completely' derived by the frontend server using metadata supplied in various places. Even without a customized form entering appropriate data is very easy and straightforward. If an entry is invalid and violates a constraint the custom error message(s) fully describe how to correct the entry. A modified form with menus will be available in the future. Sql background for computing cumulative/running sums Traditionally computing a running sum involved a subquery with an inequality comparision. Given the sample table Stocks, the following sql query will compute running sums for each Stock in ascending order of QTime. The query can be run from Dataphor using the SQLQuery operator to pass it directly to Sql Server. The use of the row_number() function allows the ranks to be easily compared. select SQLQuery (' select A.Stock,A.PRank,A.QTime,A.Quote, (select Sum(B.Quote) from (select Stock,QTime,Quote, row_number()over(Partition by Stock Order by QTime,Quote) as PRank from Stocks) as B where B.Stock=A.Stock and B.PRank<=A.PRank) as YSum from (select Stock,QTime,Quote, row_number()over(Partition by Stock Order by QTime,Quote) as PRank from Stocks) as A '); Stock PRank QTime Quote YSum -------- ----- ---------------------- ----- ---- IBM 1 4/3/2006 2:47:00 PM 107 107 IBM 2 7/16/2006 7:39:00 AM 157 264 IBM 3 8/26/2006 6:24:00 PM 125 389 IBM 4 9/8/2006 1:11:00 PM 171 560 IBM 5 11/22/2006 12:01:00 PM 155 715 IBM 6 2/2/2007 12:03:00 AM 171 886 IBM 7 4/13/2007 3:30:00 AM 219 1105 IBM 8 10/18/2007 12:12:00 AM 133 1238 IBM 9 12/25/2007 10:48:00 AM 139 1377 MS 1 3/17/2006 7:12:00 PM 257 257 . This method is similar to computing ranks as discussed here and is also just as inefficient. The impact of the subquery is more compelling when it is expressed as a table. The following query gives the counts for the stocks: select Stocks group by {Stock} add{Count() Cnt}; Stock Cnt -------- --- IBM 9 MS 11 MySQL_AB 12 Oracle 11 Sybase 9 For a table to represent the subquery it must be able to accommodate the maximum number of rows in a Stock. Based on MySQL_AB, we need at least 12 rows. Consider the following D4 batch which populates a table assuming the current row is the 6th row (PRank=6) in the 'IBM' Stock partition. The table represents all rows up to and including the 6th row. Non-existent rows, thru the use of IfNil, are given a PRank and nil (null) value for Quote. A sum of the Quote values from the table would represent the running sum up thru the 6th rank (PRank) for the 'IBM' Stock. //Shows table based on table var S for a particular Stock and PRank. var S:=SQLQuery('select Stock,QTime,Quote, row_number()over(Partition by Stock Order by QTime,Quote) as PRank from Stocks') {Stock,PRank,Quote} ; var Stock:='IBM'; var PRank:=6; select table { IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}), (S adorn{key{Stock,PRank}})[Stock,PRank] } order by {PRank desc}; Stock PRank Quote ---------- ----- ---------- IBM 6 171 IBM 5 155 IBM 4 171 IBM 3 125 IBM 2 157 IBM 1 107 <No Value> -6 <No Value> <No Value> -7 <No Value> <No Value> -8 <No Value> <No Value> -9 <No Value> <No Value> -10 <No Value> <No Value> -11 <No Value> The fact that the table has to be repopulated for each rank underscores the inefficiency of the subquery. There is no reuse of information from one rank to the next. When we compute the running sum for 'IBM': var S:=SQLQuery('select Stock,QTime,Quote, row_number()over(Partition by Stock Order by QTime,Quote) as PRank from Stocks') {Stock,PRank,Quote} ; select S add { Sum ( Quote from ( table { IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}), (S adorn{key{Stock,PRank}})[Stock,PRank] } ) ) SumQte } ; Stock PRank Quote SumQte -------- ----- ----- ------ IBM 1 107 107 IBM 2 157 264 IBM 3 125 389 IBM 4 171 560 IBM 5 155 715 IBM 6 171 886 IBM 7 219 1105 IBM 8 133 1238 IBM 9 139 1377 MS 1 257 257 . We are making 45 inserts into the table since we are starting over from the 1st rank for each row of 'IBM': 1+2+3+4+5+6+7+8+9=45 inserts. In the example operators D4ApplyforForm (used on the before insert event) and D4ApplyforFormUpdate (used on the before update event) represent the the idea of having to build a solution independently for each row. It therefore represents an inefficient solution just as a subquery does. Obviously it would make a lot more sense to simply add each new row to the table which had all the previous rows and then get the sum. We could represent the addition of a single row to the table as follows: //Shows reusing information from one PRank to another. var S:=SQLQuery('select Stock,QTime,Quote, row_number()over(Partition by Stock Order by QTime,Quote) as PRank from Stocks') {Stock,PRank,Quote} ; var Stock:='IBM'; var PRank:=6; var T1:= table { IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}), (S adorn{key{Stock,PRank}})[Stock,PRank] } ; select T1 union table { IfNil((S adorn{key{Stock,PRank}})[Stock,PRank+1],row{-99 PRank,nil Stock,nil Quote}) } where IsNotNil(Quote) order by {PRank desc}; Stock PRank Quote ----- ----- ----- IBM 7 219 IBM 6 171 IBM 5 155 IBM 4 171 IBM 3 125 IBM 2 157 IBM 1 107 In this way there would be as many inserts into the table as rows for the Stock. In other words we would be scanning the data just once for the nine rows vs. the multiple scans for the subquery (table). We can show the idea of accumulating sums incrementally. For example the running sum for row 6, adding the next row to the previously stored table and then getting the sum for row 7: var S:=SQLQuery('select Stock,QTime,Quote, row_number()over(Partition by Stock Order by QTime,Quote) as PRank from Stocks') {Stock,PRank,Quote} ; var Stock:='IBM'; var PRank:=6; var T1:= table { IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}), IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}), (S adorn{key{Stock,PRank}})[Stock,PRank] } ; select table { row{'Accumulation of Sum for row 6' Desc,Sum(Quote from T1) SumQte}, row{'Accumulation of Sum for row 7', Sum(Quote from ( T1 union table { IfNil((S adorn{key{Stock,PRank}})[Stock,PRank+1],row{-99 PRank,nil Stock,nil Quote}) } where IsNotNil(Quote) )) } }; Desc SumQte ----------------------------- ------ Accumulation of Sum for row 6 886 Accumulation of Sum for row 7 1105 In the example operators D4WindowforForm (used on the before insert event) and D4WindowforFormUpdate (used on the before update event) represent the the idea of reusing prior information when appropriate by incrementally adding (and when appropriate deleting) rows from a work table that is used to obtain cumulative aggregates. View operator D4WindowforForm here. The Sql window The idea of accumulating running sums based on a new construct was introduced in the sql-99 standard: ISO/ANSI: Introduction to OLAP functions http://tinyurl.com/2taahc 'An <OLAP function> is defined using a window. A window may specify a partitioning, an ordering of rows within partitions, and an aggregation group. The aggregation group specifies which rows of a partition, relative to the current row, should participate in the calculation of an aggregate. Through aggregation groups, windows support such important OLAP capabilities as cumulative sums and moving averages. Windows may be specified either in the new WINDOW clause, or in-line in the SELECT list.' The running sums for Stock is expressed by: SELECT Stock,QTime,Quote, Sum(Quote) OVER (PARTITION BY Stock ORDER BY QTime,Quote 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW') AS SumQte FROM Stocks; where the window is: Sum(Quote) OVER (PARTITION BY Stock ORDER BY QTime,Quote 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW') AS SumQte The 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW' is called the aggregate group clause/statement. The definition of the terms is as follows: window-aggregation-group-clause The aggregation group of a row R is a set of rows, defined relative to R in the ordering of the rows of R's partition. This clause specifies the aggregation group. ROWS Indicates the aggregation group is defined by counting rows. group-BETWEEN Specifies the aggregation group start and end based on ROWS. UNBOUNDED PRECEDING Includes the entire partition preceding the current row. UNBOUNDED FOLLOWING Includes the entire partition following the current row. CURRENT ROW Specifies the start or end of the aggregation group as the current row. PRECEDING Specifies the number of rows preceding the current row.as a positive integer indicating a number of rows. FOLLOWING Specifies the number of rows following the current row. as a positive integer indicating a number of rows. The definition 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW' or its abbreviated form 'ROWS UNBOUNDED PRECEDING' means include all the rows in the partition prior to and including the current row in the computation of the sum. The sql-99 window does not introduce anything that couldn't be done previously. What it does is simplify the way it is done and most importantly make accumulation much more efficient. We can express the idea of the sql window in Sql Server with the following procedure: create procedure SqlWindow @From INT, @To INT as SELECT A.PRank,A.Stock,A.QTime,A.Quote, WCnt,SumQte,MinQte,MaxQte,AvgQte FROM (SELECT Stock,QTime,Quote, ROW_NUMBER() OVER(PARTITION BY Stock ORDER BY QTime) AS PRank FROM Stocks) AS A CROSS APPLY ( SELECT COUNT(*) AS WCnt,SUM(Quote) AS SumQte,MIN(Quote) AS MinQte, MAX(Quote) AS MaxQte,CAST(AVG(1.*Quote) AS DECIMAL(6,1)) AS AvgQte FROM (SELECT Stock,Quote,ROW_NUMBER() OVER(ORDER BY QTime) AS PRank FROM Stocks AS B WHERE B.Stock=A.Stock) AS C WHERE C.PRank BETWEEN A.PRank+@From AND A.PRank+@To ) AS D The @From and @To parameters are the bound of rows relative to the current row that the running/cumulative sum is obtained for. To obtain the running sums (and other aggregates) we can execute the procedure using an aggregate group definition that includes all prior rows in the sum: select SQLQuery ( 'Exec SqlWindow @From,@To', WList('ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW') {Wfrom From,Wto To} ); PRank Stock QTime Quote WCnt SumQte MinQte MaxQte AvgQte ----- -------- ---------------------- ----- ---- ------ ------ ------ ------ 1 IBM 4/3/2006 2:47:00 PM 107 1 107 107 107 107.0 2 IBM 7/16/2006 7:39:00 AM 157 2 264 107 157 132.0 3 IBM 8/26/2006 6:24:00 PM 125 3 389 107 157 129.7 4 IBM 9/8/2006 1:11:00 PM 171 4 560 107 171 140.0 5 IBM 11/22/2006 12:01:00 PM 155 5 715 107 171 143.0 6 IBM 2/2/2007 12:03:00 AM 171 6 886 107 171 147.7 7 IBM 4/13/2007 3:30:00 AM 219 7 1105 107 219 157.9 8 IBM 10/18/2007 12:12:00 AM 133 8 1238 107 219 154.8 9 IBM 12/25/2007 10:48:00 AM 139 9 1377 107 219 153.0 1 MS 3/17/2006 7:12:00 PM 257 1 257 257 257 257.0 . . The D4 operator WList parses the aggregate definition to obtain the @From and @To values. Executing the WList operator with the definition we obtain the bounds of the cumulative sum. WList returns a row with the From and To values: select WList('ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW'); Wfrom Wto ----------- --- -1000000000 0 The Wfrom value of -1000000000 simply represents a large negative number guaranteed to capture all prior rows. The value of 0 for Wto represents the current row. The specify construct '{Wfrom From,Wto To}' simply renames the columns of the row returned by operator WList to the names of the parameters expected by the sql procedure SqlWindow. If we wanted cumulative sums from the current row thru all rows available for a Stock we can use either of the following: select WList('ROWS UNBOUNDED FOLLOWING'); select WList('ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING'); Wfrom Wto ----- ---------- 0 1000000000 The Wfrom value of 0 is the current row and the Wto value of 1000000000 guarantees we will include all available 'following' rows in the partition for the cumulative sum. Executing the SqlWindow procedure: select SQLQuery ( 'Exec SqlWindow @From,@To', WList('ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING') {Wfrom From,Wto To} ); PRank Stock QTime Quote WCnt SumQte MinQte MaxQte AvgQte ----- -------- ---------------------- ----- ---- ------ ------ ------ ------ 1 IBM 4/3/2006 2:47:00 PM 107 9 1377 107 219 153.0 2 IBM 7/16/2006 7:39:00 AM 157 8 1270 125 219 158.8 3 IBM 8/26/2006 6:24:00 PM 125 7 1113 125 219 159.0 4 IBM 9/8/2006 1:11:00 PM 171 6 988 133 219 164.7 5 IBM 11/22/2006 12:01:00 PM 155 5 817 133 219 163.4 6 IBM 2/2/2007 12:03:00 AM 171 4 662 133 219 165.5 7 IBM 4/13/2007 3:30:00 AM 219 3 491 133 219 163.7 8 IBM 10/18/2007 12:12:00 AM 133 2 272 133 139 136.0 9 IBM 12/25/2007 10:48:00 AM 139 1 139 139 139 139.0 <- Last row 1 MS 3/17/2006 7:12:00 PM 257 11 2254 100 290 204.9 . . You can see that it is the opposite of 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW'. For the last row in the partition we are left with only the current row being used for the cumulative sum. Suppose we want sums for the current row, 2 prior rows and 1 following row. The aggregate group definition would be: select WList('ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING'); Wfrom Wto ----- --- -2 1 This definition therefore can use 4 rows, the 2 prior rows from the current row, the current row and the next row. select SQLQuery ( 'Exec SqlWindow @From,@To', WList('ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING') {Wfrom From,Wto To} ); PRank Stock QTime Quote WCnt SumQte MinQte MaxQte AvgQte ----- -------- ---------------------- ----- ---- ------ ------ ------ ------ 1 IBM 4/3/2006 2:47:00 PM 107 2 264 107 157 132.0 <- First row 2 IBM 7/16/2006 7:39:00 AM 157 3 389 107 157 129.7 3 IBM 8/26/2006 6:24:00 PM 125 4 560 107 171 140.0 4 IBM 9/8/2006 1:11:00 PM 171 4 608 125 171 152.0 5 IBM 11/22/2006 12:01:00 PM 155 4 622 125 171 155.5 6 IBM 2/2/2007 12:03:00 AM 171 4 716 155 219 179.0 7 IBM 4/13/2007 3:30:00 AM 219 4 678 133 219 169.5 8 IBM 10/18/2007 12:12:00 AM 133 4 662 133 219 165.5 9 IBM 12/25/2007 10:48:00 AM 139 3 491 133 219 163.7 1 MS 3/17/2006 7:12:00 PM 257 2 424 167 257 212.0 . . Note that not all rows have 4 values. For the 1st row (PRank=1) there are no prior rows, there is only the current row and the one following. Therefore the aggregates for the 1st row are based on cumulative values from only 2 rows. Note that a cumulative sum need not include the current row. We can base the accumulation on prior or following rows. For example we can use the prior rows between 4 and 2: select WList('ROWS BETWEEN 4 PRECEDING AND 2 PRECEDING') Wfrom Wto ----- --- -4 -2 select SQLQuery ( 'Exec SqlWindow @From,@To', WList('ROWS BETWEEN 4 PRECEDING AND 2 PRECEDING') {Wfrom From,Wto To} ); PRank Stock QTime Quote WCnt SumQte MinQte MaxQte AvgQte ----- -------- ---------------------- ----- ---- ---------- ---------- ---------- ---------- 1 IBM 4/3/2006 2:47:00 PM 107 0 <No Value> <No Value> <No Value> <No Value> 2 IBM 7/16/2006 7:39:00 AM 157 0 <No Value> <No Value> <No Value> <No Value> 3 IBM 8/26/2006 6:24:00 PM 125 1 107 107 107 107.0 <-First value 4 IBM 9/8/2006 1:11:00 PM 171 2 264 107 157 132.0 5 IBM 11/22/2006 12:01:00 PM 155 3 389 107 157 129.7 6 IBM 2/2/2007 12:03:00 AM 171 3 453 125 171 151.0 7 IBM 4/13/2007 3:30:00 AM 219 3 451 125 171 150.3 8 IBM 10/18/2007 12:12:00 AM 133 3 497 155 171 165.7 9 IBM 12/25/2007 10:48:00 AM 139 3 545 155 219 181.7 1 MS 3/17/2006 7:12:00 PM 257 0 <No Value> <No Value> <No Value> <No Value> 2 MS 5/9/2006 3:12:00 PM 167 0 <No Value> <No Value> <No Value> <No Value> 3 MS 6/24/2006 1:51:00 AM 277 1 257 257 257 257.0 . . It's not until the 3rd row of a partition that a row in included. We can lead or lag by using the same value from both From and To. For example to lead by 1 row we can use: select SQLQuery ( 'Exec SqlWindow @From,@To', WList('ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING') {Wfrom From,Wto To} ); PRank Stock QTime Quote WCnt SumQte MinQte MaxQte AvgQte ----- -------- ---------------------- ----- ---- ---------- ---------- ---------- ---------- 1 IBM 4/3/2006 2:47:00 PM 107 1 157 157 157 157.0 2 IBM 7/16/2006 7:39:00 AM 157 1 125 125 125 125.0 3 IBM 8/26/2006 6:24:00 PM 125 1 171 171 171 171.0 4 IBM 9/8/2006 1:11:00 PM 171 1 155 155 155 155.0 5 IBM 11/22/2006 12:01:00 PM 155 1 171 171 171 171.0 6 IBM 2/2/2007 12:03:00 AM 171 1 219 219 219 219.0 7 IBM 4/13/2007 3:30:00 AM 219 1 133 133 133 133.0 8 IBM 10/18/2007 12:12:00 AM 133 1 139 139 139 139.0 9 IBM 12/25/2007 10:48:00 AM 139 0 <No Value> <No Value> <No Value> <No Value> <-Last row 1 MS 3/17/2006 7:12:00 PM 257 1 167 167 167 167.0 2 MS 5/9/2006 3:12:00 PM 167 1 277 277 277 277.0 3 MS 6/24/2006 1:51:00 AM 277 1 250 250 250 250.0 4 MS 8/18/2006 6:46:00 PM 250 1 290 290 290 290.0 5 MS 1/16/2007 5:56:00 PM 290 1 177 177 177 177.0 6 MS 1/22/2007 1:28:00 AM 177 1 150 150 150 150.0 7 MS 5/30/2007 4:30:00 PM 150 1 263 263 263 263.0 8 MS 6/21/2007 1:54:00 PM 263 1 153 153 153 153.0 9 MS 7/17/2007 12:20:00 PM 153 1 170 170 170 170.0 10 MS 11/26/2007 1:45:00 AM 170 1 100 100 100 100.0 11 MS 12/25/2007 3:54:00 PM 100 0 <No Value> <No Value> <No Value> <No Value> <-Last row 1 MySQL_AB 2/8/2006 3:47:00 PM 21 1 25 25 25 25.0 . . The last row for each partition (Stock) has no data in the accumulation. Note that the lead precludes the use of previous information. Modeling the Sql Window The D4 operators that compute cumulative aggregates, (D4ApplyforForm, D4ApplyforFormUpdate, D4WindowforForm and D4WindowforFormUpdate) take the sql procedure SqlWindow as their logical model. The operators implement the logic of the procedure using cursors as a way to illustrate their use. As stated previously the D4Apply operators follow the sql procedure more faithfully and reflect the same inefficiency as the procedure. The D4Window operators follow the same basic logic but make use of previous rows. This interdependency reflects a much more efficient methodology and hence is more faithful to the idea of the efficient sql window.
Thursday, July 05, 2007
Dataphor - Create string intervals
This problem appeared on the sqlservercentral.com forums: 'Compare two varchar arrays one character at time until different' http://tinyurl.com/yochaa The original poster explains: Some background info first. We have an ASP.NET Web Application that serves up a list of names in a dynamic tree control. The nodes of the tree are generated on demand and loaded into the tree dynamically. When this list of names become very big (and indeed it will be), the delay takes too long. So to minimize the delay we will group the names into dynamically created folders. Each folder will contain n number of names based on some passed in limit (In my code below this is the variable @NthRow). This way we can mimimize the number of nodes to fetch yet still support a large number of names. The final output of our function must be a table of strings in the form of X - Y where X is the starting name and Y is the ending name of the group. For example, given the following names and @NthRow = 5 Baker, Ms. Jamie Espinosa, Ms. Jean Gonzales, Mr. Robert Holland, Ms. Julia Jimenez, Ms. Soni Macdonald, Mr. Mickey Miller, Ms. Jana Noriega, Ms. Michelle Owen, Ms. Alma Ramirez, Ms. Stephanie Salas, Ms. Jeanine Sipes, Mr. Tyler Tamayo, Ms. Laura Timmer, Ms. Julie Trevino, Ms. Yvonne Young, Ms. Annia Produce an output table of: Ba-Ji Ma-Ra Sa-Tr Y-Y Where: Ba-Ji represents the names Baker, Ms. Jamie thru Jimenez, Ms. Soni Ma-Ra represents the names Macdonald, Mr. Mickey thru Ramirez, Ms. Stephanie Sa-Tr represents the names Salas, Ms. Jeanine thru Trevino, Ms. Yvonne Y-Y represents the name Young, Ms. Annia ------------------------------------------------------ This solution uses the D4 language of Dataphor and MS Sql Server 2005 as the data repository, Basically we want to take a table of names ordered by last name and divide it into intervals of NthRow names. We want at least the 1st 2 characters of of the starting and ending last names for each interval. We will allow that if the last interval contains only 1 name we want to show a lesser number of characters than other intervals. Here is some sample data entered in Dataphor and stored in sql server: create table MyNames { ID:Integer, Name:String tags{Storage.Length = "50"}, key{ID} }; insert table { row{1 ID,' Owen , Ms. Alma' Name}, row{2,'Ramirez, Ms. Stephanie'}, row{3,' Salas, Ms. Jeanine'}, row{4,' Sipes, Mr. Tyler'}, row{5,'Tamayo, Ms. Laura'}, row{6,'Timmer , Ms. Julie'}, row{7,' Trevino, Ms. Yvonne'}, row{8,' Young, Ms. Annia'}, row{9,'Baker, Ms. Jamie'}, row{10,' Espinosa, Ms. Jean'}, row{11,' Gonzales, Mr. Robert'}, row{12,' Holland, Ms. Julia'}, row{13,'Jimenez, Ms. Soni'}, row{14,' Macdonald, Mr. Mickey'}, row{15,' Miller, Ms. Jana'}, row{16,'Noriega, Ms. Michelle'} } into MyNames; Operator ArrayVarNames takes parameters of a 'table' with columns ID and Name, NthRow and NChars which is the number of characters +1 to identity a name. The operators returns a table with the interval information, the count of names in the interval and the identifying characters. create operator ArrayVarNames(MyNames:table{ID:Integer,Name:String},NthRow:Integer,NChars:Integer): table{Index:Integer,NFrom:Integer,NTo:Integer,NCnt:Integer, CharFrom:String,CharTo:String} begin result:=table of typeof(result){}; //Return a table variable with a sequence number (RowID) reflecting Lastname //sorted in ascending order. var T:= ToTable(ToList(cursor( (MyNames add{Name.Split()[0].Trim() LastName} with {IgnoreUnsupported = 'true'}) order by {LastName}))) {sequence+1 RowID,Name,LastName}; var MaxID:=Max(RowID from T); //Max RowID in table T. //NGroups is a table variable that uses NthRow and the numbers table to form //intervals. For each interval (Index) we get the count (Cnt), the max RowID //(MaxRowID) for the interval (we really are only interested in MaxRowID //for the last interval). var NGroups:= ( numbers where num between 1 and MaxID {num Index,((num-1)*NthRow)+1 From,num*NthRow To} ) times T with {IgnoreUnsupported = 'true'} where RowID between From and To group by {Index} add{Count() Cnt,Max(RowID) MaxRowID} {Index,Cnt,MaxRowID}; //LRow is a row variable with the max Index (MaxIndex), which is the max interval //given data MyNames and NthRow, the count (Cnt) of the rows for the max Index //and the max RowID for MyNames. Note that LRow uses the table (variable) NGroups. var LRow:= ( NGroups adorn{key{Index}} return 1 by {Index desc} {Index MaxIndex,MaxRowID,Cnt})[]; //We get NChars+1 characters of the Lastname for the starting (NFrom) RowID //and ending RowID (NTo) for each interval by using RowID as a key to look //up LastName in table T. We use some boolean logic to check if there is //only 1 LastName in the last interval (MaxIndex). If true we only want //the NChars character of LastName. We put the boolean logic directly in SubString. //We also add the count (NCnt) for each interval. result:= numbers rename{num Index} where Index between 1 and LRow.MaxIndex add{(NGroups adorn{key{Index}})[Index].Cnt NCnt} with {IgnoreUnsupported = 'true'} add{((Index-1)*NthRow)+1 NFrom,Index*NthRow NTo} add { SubString((T adorn{key{RowID}})[NFrom].LastName,0, (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))) CharFrom, IfNil( SubString((T adorn{key{RowID}})[NTo].LastName,0, (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))), SubString((T adorn{key{RowID}})[LRow.MaxRowID].LastName,0, (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))) ) CharTo } {Index,NFrom,NTo,NCnt,CharFrom,CharTo} with {IgnoreUnsupported = 'true'} ; end; For example given the original data: select ArrayVarNames(MyNames,5,1); Index NFrom NTo NCnt CharFrom CharTo ----- ----- --- ---- -------- ------ 1 1 5 5 Ba Ji 2 6 10 5 Ma Ra 3 11 15 5 Sa Tr 4 16 20 1 Y Y select ArrayVarNames(MyNames,6,2); Index NFrom NTo NCnt CharFrom CharTo ----- ----- --- ---- -------- ------ 1 1 6 6 Bak Mac 2 7 12 6 Mil Sip 3 13 18 4 Tam You We can use the operator with any table by simply renaming the columns of the table. Here we use the Orders table of the Sql Server Northwind database to create intervals for customers by simply renaming the columns of the Orders table to agree with those expected by ArrayVarNames: select ArrayVarNames (Orders group by {CustomerID} add{Max(OrderID) ID} {ID,CustomerID Name},15,4); Index NFrom NTo NCnt CharFrom CharTo ----- ----- --- ---- -------- ------ 1 1 15 15 ALFKI COMMI 2 16 30 15 CONSH GOURL 3 31 45 15 GREAL LILAS 4 46 60 15 LINOD QUEEN 5 61 75 15 QUICK THEBI 6 76 90 14 THECR WOLZA We can create a similar table using the Products table of the Northwind db using ProductID and ProductName. select ArrayVarNames (Products {ProductID ID,ProductName Name},10,30); Index NFrom NTo NCnt CharFrom CharTo ----- ----- --- ---- ----------------------------- ------------------------------- 1 1 10 10 Alice Mutton Chef Anton's Gumbo Mix 2 11 20 10 Chocolade Grandma's Boysenberry Spread 3 21 30 10 Gravad lax Jack's New England Clam Chowder 4 31 40 10 Konbu Mishi Kobe Niku 5 41 50 10 Mozzarella di Giovanni Queso Cabrales 6 51 60 10 Queso Manchego La Pastora Scottish Longbreads 7 61 70 10 Singaporean Hokkien Fried Mee Tofu 8 71 80 7 Tourtière Zaanse koeken Hopefully examples like these will encourage you to explore Dataphor -:)
Monday, July 02, 2007
Dataphor - Data scrubbing using lists
This contrived problem illustrates the ease with which lists can be used in D4. The example makes use of the following Dataphor constructs: TableDee ( table { row { } } ) http://www.alphora.com/docs/D4LGColumns.html Indexer Expression http://www.alphora.com/docs/O-System.iIndexer.html Using Rows http://www.alphora.com/docs/DDGModelingProcessLogic-ValuesandVariables-UsingRows.html List Values http://www.alphora.com/docs/D4LGLanguageElements-Values-ListValues.html Using Lists http://www.alphora.com/docs/DDGModelingProcessLogic-ValuesandVariables-UsingLists.html Split operator http://www.alphora.com/docs/O-System.Split.html Concat (Concatenate) http://www.alphora.com/docs/O-System.Concat.html Aggregate http://www.alphora.com/docs/D4LGTableExpressions-Aggregate.html Max http://www.alphora.com/docs/O-System.Max.html Problem - create a comma delimited string from a string that initially can contain any number of potential delimiters. Each string will be delimited by a blank (' ') and another delimiter of a single character. The 2nd delimiter will be a character that occurs the most times in the string. A delimiter is any character other than a number or letter. If there are ties between delimiter characters (other than blank) return a blank string since we can't determine the 2nd delimiter. For example: Initial String New String -------------- ---------- "AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1" "AAA,BB,BY3J,CCCH1,PFYSD,GE1" which has 6 items. The initial string has 4 potential delimiters: a comma, a colon, a slash and a semi-colon {',',':','/',';'} besides a blank which is always a delimiter. Since comma occurs 3 times, more than any of the other potential delimiters, it is used along with a blank to split the string. The other potential delimiters are removed from the new string. Note that the characters of the new string keep their relative position (from left to right). For: Initial String New String -------------- ---------- " LSF QR,U1/T5,P13/KE3" the new string is blank because '/' and ',' both occur 2 times (tied). For: Initial String New String -------------- ---------- " LSF QR U1" the new string is blank since there must be a delimiter besides a blank (' '). Operator ANewStr returns a row with the following columns: NewStr : the new comma delimited string. Char : the delimiter used along with blank (see LRow). Items : the number of items in the string, ie. the number of items that would be obtained if the new string was split into a list using comma as the delimiter. Cnt : see LRow. Dels : see LRow. create operator ANewStr(Str:String): row{NewStr:String,Items:Integer,Cnt:Integer,Dels:String,Char:String} begin result:=row of typeof(result){ }; /* LRow is a row variable with columns Char : the single non number and non letter character that occurs the most times in the string that will serve as a delimiter along with a blank. LList : the delimiters to use to split the string, stored as a list. Cnt : the count of non blanks delimiters. A 1 indicates a delimiter was found, >1 means ties and therefore no non blank delimiter was found. Dels: a concatenated comma delimited string of blank (' ') and the new found delimiter (Char). */ var LRow:= (table { row { } } add{Str Str} times numbers where num<Length(Str) add{Str[num] Char} where (Char<>' ') and (not IsLetterOrDigit(Char)) group by {Char} add{Count() Cnt} return 1 by {Cnt desc} with {IgnoreUnsupported = 'true'} add{'' Del} group add{Max(Char) Char,Count() Cnt, Concat(Char,Del order by {Char}) Dels} add{(' '+ Unicode({177})+Char).Split( Unicode({177})) LList} {LList,Cnt,(Unicode({39})+' '+Dels+Unicode({39})) Dels})[]; result:= /* The string is split by the Split operator using 2 delimiters, blank and LRow.Char which are stored as a list in LRow.LList. Table numbers is a table of numbers with a single column num. The num column is used to label the items of the list (StrPart). */ ( ( table { row { } } add{Str Str} times numbers with {IgnoreUnsupported = 'true'} where num<Str.Split(LRow.LList).Count() with {IgnoreUnsupported = 'true'} add{Str.Split(LRow.LList)[num] StrPart} where StrPart>' ' {num+1 Index,StrPart} where LRow.Cnt=1 ) /* Split the Strpart strings into individual characters to eliminate non digits/numbers. Then concatenate the strings to form the new string (NewStr). Note that this 2nd splitting is a bit redundant. We could have formed the new string with a single split but this illustrates the logical, simple and straightforwards ways that lists can be created and used. */ times numbers where num<Length(StrPart) add{StrPart[num] Char} where IsLetterOrDigit(Char) add{'' Del} adorn{key{Index,num}} group by {Index} add{Concat(Char,Del order by {Index,num}) NewStr} add{',' Del} group add{Concat(NewStr,Del order by {Index}) NewStr} add{LRow.LList[1] Char,NewStr.Split().Count() Items,LRow.Cnt Cnt} {IfNil(NewStr,' ') NewStr,IfNil(Items,0) Items,Char,Cnt,LRow.Dels Dels} redefine { Char:=case when Items>0 then Char else 'Ties' end })[]; end; Examples: For a single string: select ANewStr("AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1"); NewStr Items Cnt Dels Char --------------------------- ----- --- ---- ---- AAA,BB,BY3J,CCCH1,PFYSD,GE1 6 1 ' ,' , select ANewStr(" LSF//, /AS QR,U1/ /T5,P13//KE3") ; NewStr Items Cnt Dels Char --------------------- ----- --- ---- ---- LSF,AS,QRU1,T5P13,KE3 5 1 ' /' / Return the orginal string with the new string information (row): var AString:="R:B U:AS,1G'E:S SG:E5.16'Z9" ; select ANewStr(AString) add{AString Str} {Str,NewStr,Char,Items,Cnt,Dels}; Str NewStr Char Items Cnt Dels --------------------------- ----------------------- ---- ----- --- ---- R:B U:AS,1G'E:S SG:E5.16'Z9 R,B,U,AS1GE,S,SG,E516Z9 : 7 1 ' :' The case of equal occurrences of a delimiter: select ANewStr("A1,FS KE*VS1 L1,FW*A") ; NewStr Items Cnt Dels Char ------ ----- --- ----- ---- 0 2 ' *,' Ties Working with data in a table: var StrTable:= table { row{1 ID,"AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1" Str}, row{2," LSF//, /AS QR,U1/ /T5,P13//KE3"}, row{3,"R:B U:AS,1G'E:S SG:E5.16'Z9"}, row{4,"A1,FS KE*VS1 L1,FW*A"} }; select StrTable add{ ANewStr(Str).NewStr NewStr, ANewStr(Str).Char Char, ANewStr(Str).Cnt Cnt }; ID Str NewStr Char Cnt -- -------------------------------- --------------------------- ---- --- 1 AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1 AAA,BB,BY3J,CCCH1,PFYSD,GE1 , 1 2 LSF//, /AS QR,U1/ /T5,P13//KE3 LSF,AS,QRU1,T5P13,KE3 / 1 3 R:B U:AS,1G'E:S SG:E5.16'Z9 R,B,U,AS1GE,S,SG,E516Z9 : 1 4 A1,FS KE*VS1 L1,FW*A Ties 2 Interested readers may want to check out my comment on sql attempts to simulate lists.
Sql - History repeats itself simulating lists
As explained here, before sql-99 analytic functions a rank was expressed in the form of a subquery: (select aggregate from <foo> where predicate) as rank where Count was generally used as the aggregate and the predicate was an inequality comparison (table2.Y<=table1.Y). As the definition of the rank was refined the predicate became progressively more complex. If duplicate values existed and unique ranks desired then another column had to be used to resolve ties. An expanded predicate could look like: where (table2.Y<table1.Y) OR (table1.Y=table2.Y AND table2.RowID<=table1.RowID) Many experts were so uncomfortable with the nature of computing ranks that they shifted the intent from a rank to a 'line number'. With this slight of hand they were able to argue that it was a 'format' feature and therefore on the conceptual level of 'reporting'. And such things are always done on the client. With the introduction of sql-99 ranking functions and the benefit of clear hindsight the whole issue of how ranks were previously computed revealed itself. Compared to the simplicity and efficiency of ranking functions the sql queries were seen for the odious things they were. What was thought to be 'cleaver' sql was really a query worthy of Dr. Frankenstein. And any calls for removal to the client disappeared. So one would think sql users would have learned an important lesson - the simulation of a construct by a bunch of unrelated constructs is a poor substitute for the real thing. But it seems sqls span of attention is short and history is repeating itself. We can add the idea of simulating a 'list' to the misbegotten ranks. Just as an aggregate and predicate were used for ranks the 'substring' function and a table of numbers are being used to formulate a 'list'. Just like the rank predicate, the substring expression becomes more complex as the nature of the list becomes more refined. There are many examples of list simulation such as here. But all sql list simulations are fundamentally flawed. The principle way for sql to represent something is by a 'table'. So a table is used to represent a list. But a 'table' is a different animal than a 'list'. Sql can logically represent a rank because a rank is a number and sql has a type number. But sql has no type for a list. The same holds for representing a list by any type of string. A string, no matter delimited or non-delimited is not a list. One would think that the function 'substring' would lead to questioning the sql ways to create a list. It is, after all, substring, ie. a function designed explicitly for a 'string'! With a list sql is doomed before it even starts to simulate it. The logical inconsistencies of lists in sql are unfortunately hidden much like ranking functions were hidden. What has filled this vacuum is so much non-sense. It is truly unfortunate that the only thing that can reveal the extent of the logical confusion, not to mention the code mess, is to work with a list type such as in the D4 language of Dataphor.
Subscribe to:
Posts (Atom)