If your not familiar with relational division a popular article for the translation of the concept into sql is: Relational Division by Joe Celko Basically relational division is the question of whether a set of predicates exist in a given table. Generally the predicates represent a set of 'rows' and therefore the question becomes does a predicate in the form of a 'table' exist in another table. The predicate rows are referred to as the divisor table, the given table as the dividend table and the result as the quotient table. Hence the idea of division by tables. Since sql does not support a table 'type' various solutions have been offered to target the rows of the given (dividend) table. Most often the means of targeting rows are the use of aggregate functions like count. Because sql cannot directly compare one set of rows (ie. a table) to another set of rows (a table) the forced simulation is not unlike the attempt to simulate a list with the substring function as explained here. The resulting sql solutions can at the very least be unintuitive and convoluted and at the very most not even possible to specify (at least for most human beings ☺. Given that Dataphor supports direct table comparisons the logic of a relational division problem can be greatly simplified. This example uses MS Sql Server 2005 and is based on the post: Thursday, July 12, 2007 11:07 PM microsoft.public.sqlserver.programming Expert Challenge - Complex Join, Group By, Having Here is some sample data. This is the dividend table. create table CTable { ID:Integer, Answer:String, Wcnt:Integer, Word:String, key{ID} }; insert table { row{1 ID,'first' Answer, 2 Wcnt,'george' Word}, row{2,'first', 2, 'burns'}, row{3,'second',2, 'burns'}, row{4,'second',2, 'burns'}, row{5,'third', 3, 'go'}, row{6,'third', 3,'george'}, row{7,'third', 3, 'burns'}, row{8,'fourth',2, 'fred'}, row{9,'fourth',2,'george'} } into CTable; Given a match (divisor) table: (test #1) declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'george' what the op refers to as a 'match' table, the relational division question becomes find the rows in the dividend table that correspond to the divisor rows. And it follows that we compare the Word and ID columns of the tables. We now make two important observations: 1. The Answer column of the dividend table defines rows that belong together. These rows form a table. 2. Since we are interesting in comparing columns of two tables we note that it is unfeasible to try to compare the ID columns of the dividend and divisor tables since the ID column of the divisor table can take any value based on the Identity function. With a little insight we see that we can compare them if we first 'standardize' the ID column in both tables. We can do this by computing the 'rank' of Words in each table and compare the ranks instead of IDs. We also note that the way the original problem was posed the Word column is of primary significance. In other words, it is the presence of the divisor Words that determines a match. There is no significance attached to ID, we are simply standarizing it to include it in a comparison. Because it has no meaning as to determining a match which ever way we standarize as long as we are consistent within both tables it will be sufficient. Sql solution Here is one of many sql solutions. The row_number() function is used to standarize the ID columns in both tables and Rank is used in the comparison. -- Match/divisor table (test#1). declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'george' select C.ID,C.Answer,C.Wcnt,C.Word from CTable as C where C.Answer in ( select B.Answer from -- Note that if we order by 'Word desc' for the ranks in both selects it -- will not make a difference, ie. (consistency for ranks between tables). (select ID,Word,row_number()over(order by Word) Rank from @match as M) as A right join (select ID,Answer,Wcnt,Word, row_number()over(partition by Answer order by Word) Rank from CTable) as B on A.Rank=B.Rank and A.Word=B.Word group by B.Answer having (count(A.rank)=count(*)) and (count(*)=(select count(*) from @match)) ) ID Answer Wcnt Word ----------- -------------------- ----------- -------------------- 1 first 2 george 2 first 2 burns Note that the join is necessary but not sufficient. The rows of Answer must be further restricted by aggregate comparisons using the count aggregate. This methodology rests on being able to express a quotient table using aggregate functions. But as the criteria for such a table becomes increasingly more complex so does the query. It is interesting that proponents of this methodology think it is quite helpful in understanding relation divison. For example the authors of: A Simpler (and Better) SQL Approach to Relational Division state 'We believe the syntactical construction of Q0 allows the student to grasp the concepts of implementing SQL division in a more intuitive way'. Perhaps most telling is their framing of the question of relational division where they state: 'A common type of database query requires one to find all tuples of some table that are related to each and every one of the tuples of a second group.' This seems to a reflection of the fact that there is no direct way to compare tables in sql. What would appear much more to the point is simply: 'A common type of database query is to compare two tables'. And this is precisely what we will do in the D4 language of Dataphor. Developing a D4 solution Keep in mind the very precise and simple definition of table equality: Two tables are equal if they have the same named columns of the same data types and they have the same set of rows. In other words, they are the same 'type' and have the same rows. The following simple D4 query illustrates how easy relational division can be. The query ignores ID and only compares tables based on Word. It uses match test#1: select //Each row in CTable that has the particular Answer value given by the where //predicate being true will be returned by the having operator. CTable having ( //Every Answer value forms a (dividend) table. We get the Answer value //by comparing the match (divisor) table to each table formed by a //particular Answer. CTable {Answer AnswerI} //We compare two tables, each row in the match table must exist //in the table formed by an Answer value. where (table{row{'burns' Word},row{'george'}}) //match table. = ((CTable where Answer=AnswerI with { ShouldSupport="false"}) {Word} ) {AnswerI Answer } ); ID Answer Wcnt Word -- ------ ---- ------ 1 first 2 george 2 first 2 burns The above query will work for matches #1 and #3 but will fail for #2. select CTable having ( CTable {Answer AnswerI} where //Using test #2. (table{row{'burns' Word},row{'burns'}}) = ((CTable where Answer=AnswerI with { ShouldSupport="false"}) {Word} ) {AnswerI Answer } ); -- Internal Index Error: "Duplicate key violation." Which makes the point that ID is necessary given duplicate Words. Now lets get serious with how we're going to do this example in D4 ☺. First we're going to create a view in D4 using an sql pass-thru query with row_number(). This will give us a rank (from 1 to Wcnt) in the direction of Word for each Answer (the partition column). create view CTableViewWord SQLQuery('select ID,Answer,Wcnt,Word, row_number()over(partition by Answer order by Word) Rank from CTable','key{Answer,Rank}') {ID,ToInteger(Rank) Rank,Answer,Wcnt,Word}; select CTableViewWord join (CTable group by {Answer} add{Min(ID) Asort}) with {IgnoreUnsupported = 'true'} return 50 by {Asort,Rank} remove{Asort}; ID Rank Answer Wcnt Word -- ---- ------ ---- ------ 2 1 first 2 burns 1 2 first 2 george 3 1 second 2 burns 4 2 second 2 burns 7 1 third 3 burns 6 2 third 3 george 5 3 third 3 go 8 1 fourth 2 fred 9 2 fourth 2 george D4 operator Words is the sql query expressed in a relational way. It too standarizes ID by using a rank for comparison. Like the sql query it only finds tables where there is the same occurrance of the Words. The values of ID are not pertinent to the intent of the problem. The intent is only to find the same Words ignoring the IDs. The operator takes a match table as an argument and returns matching tables from CTable based on Answers. //Matches with Word occurrence only. create operator Words(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; //Create a table variable with a rank in the direction of Word. This is similar //to sql row_number() function. var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {Word}))) {sequence+1 Rank,Word}) ; var RTable:= CTable having ( (CTable {Answer AnswerI}) where MatchTableRank = ( (CTableViewWord where Answer=AnswerI) {Rank,Word} ) with {IgnoreUnsupported = 'true'} {AnswerI Answer} ); //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; To clarify just what operator Words is doing lets insert some more data into table CTable. insert table { row{10 ID,'fifth' Answer,3 Wcnt,'burns' Word}, row{11,'fifth', 3, 'burns'}, row{12,'fifth', 3, 'burns'}, row{13,'sixth',4, 'arthur'}, row{14,'sixth',4, 'arthur'}, row{15,'sixth',4, 'burns'}, row{16,'sixth',4, 'burns'}, row{17,'seventh',4, 'burns'}, row{18,'seventh',4, 'burns'}, row{19,'seventh',4, 'arthur'}, row{20,'seventh',4, 'arthur'}, row{21,'eight',4, 'arthur'}, row{22,'eight',4, 'burns'}, row{23,'eight',4, 'arthur'}, row{24,'eight',4, 'burns'}, row{25,'nineth',4, 'burns'}, row{26,'nineth',4, 'burns'}, row{27,'nineth',4, 'burns'}, row{28,'nineth',4, 'arthur'}, row{29,'tenth',4, 'burns'}, row{32,'tenth',4, 'burns'}, row{35,'tenth',4, 'burns'}, row{38,'tenth',4, 'arthur'}, row{39,'eleventh',4, 'burns'}, row{41,'eleventh',4, 'burns'}, row{43,'eleventh',4, 'burns'}, row{45,'eleventh',4,'arthur'} } into CTable; Examples: select Words(table{row{11 ID,'burns' Word},row{22,'burns'}}); ID Answer Wcnt Word -- ------ ---- ----- 3 second 2 burns 4 second 2 burns select Words(table{row{11 ID,'burns' Word},row{12,'burns'},row{17,'burns'}}); ID Answer Wcnt Word -- ------ ---- ----- 10 fifth 3 burns 11 fifth 3 burns 12 fifth 3 burns The following three selects all return the same table since table equality is based on the occurrence of the three Words. The value of ID and the sequence of Words in the match table is immaterial since the comparison of Rank between the divisor (match) and dividend tables is based on Word. select Words(table{row{11 ID,'burns' Word},row{12,'george'},row{17,'go'}}); select Words(table{row{11 ID,'go' Word},row{12,'george'},row{17,'burns'}}); select Words(table{row{12 ID,'george' Word},row{13,'go'},row{14,'burns'}}); ID Answer Wcnt Word -- ------ ---- ------ 5 third 3 go 6 third 3 george 7 third 3 burns These two selects return 3 Answers and conceptually 3 tables because the two Words, 'arthur' and 'burns', occur twice within each Answer. select Words(table{row{11 ID,'arthur' Word},row{17,'burns'},row{20,'arthur'},row{21,'burns'}}); select Words(table{row{11 ID,'burns' Word},row{17,'arthur'},row{20,'arthur'},row{21,'burns'}}); ID Answer Wcnt Word -- ------- ---- ------ 13 sixth 4 arthur 14 sixth 4 arthur 15 sixth 4 burns 16 sixth 4 burns 17 seventh 4 burns 18 seventh 4 burns 19 seventh 4 arthur 20 seventh 4 arthur 21 eight 4 arthur 22 eight 4 burns 23 eight 4 arthur 24 eight 4 burns The idea of using the aggregate count in sql as a solution to relational division problems is well suited to finding occurrences of things, ie. Words. With D4 it is obviously not necessary to 'invent' an idea like count. With table comparisons it is not necessary to go beyond the 'data' in the rows. Up till now the ID value has not been meaningful. Lets change that. Here we want to match the occurrence of Word and the ID column. Said another way, we want to match the sequence of Words within an Answer. We create another view where the rank is now based on ID. create view CTableViewID SQLQuery('select ID,Answer,Wcnt,Word, row_number()over(partition by Answer order by ID) Rank from CTable','key{Answer,Rank}') {ID,ToInteger(Rank) Rank,Answer,Wcnt,Word}; select CTableViewID join (CTable group by {Answer} add{Min(ID) Asort}) with {IgnoreUnsupported = 'true'} where Answer in ({'first','second','third','fourth','sixth','eight'}) return 50 by {Asort,Rank} remove{Asort}; ID Rank Answer Wcnt Word -- ---- ------ ---- ------ 1 1 first 2 george 2 2 first 2 burns 3 1 second 2 burns 4 2 second 2 burns 5 1 third 3 go 6 2 third 3 george 7 3 third 3 burns 8 1 fourth 2 fred 9 2 fourth 2 george 13 1 sixth 4 arthur 14 2 sixth 4 arthur 15 3 sixth 4 burns 16 4 sixth 4 burns 21 1 eight 4 arthur 22 2 eight 4 burns 23 3 eight 4 arthur 24 4 eight 4 burns Operator IDandWord is similar to operator Words but here Rank reflects a meaningful sequence. Now the order of the match table is relevant to the equality of the table comparison. create operator IDandWord(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; //Create a table variable with a rank in the direction of ID. This is similar //to sql row_number() function. var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {ID}))) {sequence+1 Rank,Word}) ; var RTable:= CTable having ( (CTable {Answer AnswerI}) where MatchTableRank = ( (CTableViewID where Answer=AnswerI) {Rank,Word} ) with {IgnoreUnsupported = 'true'} {AnswerI Answer} ); //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; Here the sequence of the ID/Word combination in the match table, when standarized, is the same as the sequence in CTable so we have a match. select IDandWord(table{row{11 ID,'go' Word},row{12,'george'},row{17,'burns'}}); ID Answer Wcnt Word -- ------ ---- ------ 5 third 3 go 6 third 3 george 7 third 3 burns But these sequences of ID/Word do not match any sequence in CTable. select IDandWord(table{row{11 ID,'burns' Word},row{12,'george'},row{17,'go'}}); select IDandWord(table{row{12 ID,'george' Word},row{13,'go'},row{14,'burns'}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value> select IDandWord(table{row{11 ID,'arthur' Word},row{17,'burns'},row{20,'arthur'},row{21,'burns'}}); ID Answer Wcnt Word -- ------ ---- ------ 21 eight 4 arthur 22 eight 4 burns 23 eight 4 arthur 24 eight 4 burns No match: select IDandWord(table{row{11 ID,'burns' Word},row{17,'arthur'},row{20,'arthur'},row{21,'burns'}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value> Note that operators Words and IDandWord are the same except for the view being used, the type of standarizing on the ID column. The sql solution implies that aggregates would have to test for the sequence of IDs given the Word values. This seems to mean that sql has to use a separate set of constructs for each meaningful column involved in the match. And this certainly raises the complexity level of such a query. Now lets raise the bar one more notch in matching. Now not only the sequence of Words is meaningful (Words and IDs) but so is the relationship between the IDs. Given the three Answers that match the sequence of Words: select IDandWord(table{row{11 ID,'burns' Word},row{17,'burns'},row{20,'burns'},row{21,'arthur'}}); ID Answer Wcnt Word -- -------- ---- ------ 25 nineth 4 burns 26 nineth 4 burns 27 nineth 4 burns 28 nineth 4 arthur 29 tenth 4 burns 32 tenth 4 burns 35 tenth 4 burns 38 tenth 4 arthur 39 eleventh 4 burns 41 eleventh 4 burns 43 eleventh 4 burns 45 eleventh 4 arthur We want to be able to distinguish (match) each of these Answer tables. To do this we have to examine the relationship between ID values. The only way to target each Answer is to test the difference between the IDs. So we have to test for differences of 1,2 or 3. Operator BetweenIDs is similar to operator IDandWord but adds a column of the difference between consecutive IDs. The table comparison now involves looking at three columns, Word, Rank and TestID. create operator BetweenIDs(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {ID}))) {ID,sequence+1 Rank,Word}) ; //Compare the IDs (next to current) in sequence (ascending order) of IDs. //An indexer ([]) expression is used to directly access a particular Rank. var MatchTableID:= MatchTableRank add{IfNil(((MatchTableRank adorn{key{Rank}})[Rank+1].ID- (MatchTableRank adorn{key{Rank}})[Rank].ID),0) TestID} {Rank,Word,TestID} ; var RTable:= CTable having ( (CTable {Answer AnswerI}) where MatchTableID = ( (CTableViewID where Answer=AnswerI) add{IfNil((CTableViewID[AnswerI,Rank+1].ID- CTableViewID[Answer,Rank].ID),0) TestID} {Rank,Word,TestID} ) with {IgnoreUnsupported = 'true'} {AnswerI Answer} ); var Resultable:=table of typeof(CTable){}; //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; Match consecutivec IDs. select BetweenIDs(table{row{10 ID,'burns' Word},row{11,'burns'},row{12,'burns'},row{13,'arthur'}}); ID Answer Wcnt Word -- ------ ---- ------ 25 nineth 4 burns 26 nineth 4 burns 27 nineth 4 burns 28 nineth 4 arthur Match IDs separated by 2. select BetweenIDs(table{row{1 ID,'burns' Word},row{3,'burns'},row{5,'burns'},row{7,'arthur'}}); ID Answer Wcnt Word -- -------- ---- ------ 39 eleventh 4 burns 41 eleventh 4 burns 43 eleventh 4 burns 45 eleventh 4 arthur Match IDs separated by 3. select BetweenIDs(table{row{4 ID,'burns' Word},row{7,'burns'},row{10,'burns'},row{13,'arthur'}}); ID Answer Wcnt Word -- ------ ---- ------ 29 tenth 4 burns 32 tenth 4 burns 35 tenth 4 burns 38 tenth 4 arthur There is no match for these divisor tables: select BetweenIDs(table{row{10 ID,'burns' Word},row{11,'burns'},row{12,'burns'},row{14,'arthur'}}); select BetweenIDs(table{row{2 ID,'burns' Word},row{3,'burns'},row{5,'burns'},row{7,'arthur'}}); select BetweenIDs(table{row{14 ID,'burns' Word},row{11,'burns'},row{8,'burns'},row{5,'arthur'}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value> Hopefully you can see how D4 can take the mystery out of relational division. bye for now, steve For those who like to explore D4 here is a bonus operator ☺. Operator WordsLimit is similar to operator Words but limits the possible Answer values used for comparing the divisor and dividend tables. //This operator limits possible Answers for the table comparison. create operator WordsLimit(MatchTable:table{ID:Integer,Word:String}):typeof(CTable) begin result:=table of typeof(result){}; //Get the Word/ID combination corresponding to the last Rank (when ording by Word) //for each Answer. var LastRow:= CTableViewWord having ( CTableViewWord group by {Answer} add{Max(Rank) Rank} ) {ID,Answer,Wcnt,Word,Rank}; var MatchTableRank:= (ToTable(ToList(cursor(MatchTable order by {Word}))) {sequence+1 Rank,Word}) ; //Get Last row of Match table var LRow:= ((MatchTableRank adorn{key{Rank}}) return 1 by {Rank desc})[]; //PTable are Answer(s) from CTable where the last row of an Answer is also //the last row of the MatchTable, ie the Rank is the same and the Word is //the same as the last row in the MatchTable. It returns only those Answer(s) //that 'could' be true in a table comparison using all the rows of MatchTable //and CTable. It limits the possible Answer(s) to only those that could be //true. var PTable:= CTable {Answer} where IsNotNil ( (LastRow adorn{key{Answer,Rank,Word}})[Answer,LRow.Rank,LRow.Word by{Answer,Rank,Word}] ) with {IgnoreUnsupported = 'true'} ; //Now using PTable which reduces number of table comparisons. //Note that the entire 'having' relation is derived independently of CTable. //It must be derived first so the rows of CTable can be compared to the 'having' //relation. var RTable:= CTable having ( (PTable {Answer AnswerI}) where MatchTableRank = ( (CTableViewWord where Answer=AnswerI) {Rank,Word} ) {AnswerI Answer} ) with {IgnoreUnsupported = 'true'} ; var Resultable:=table of typeof(CTable){}; //Result table will show nil values if RTable is empty. if exists(RTable) then result:=RTable else insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result; end; select WordsLimit(table{row{10 ID,'burns' Word},row{13,'burns'},row{16,'burns'},row{19,'arthur'}}); ID Answer Wcnt Word -- -------- ---- ------ 25 nineth 4 burns 26 nineth 4 burns 27 nineth 4 burns 28 nineth 4 arthur 29 tenth 4 burns 32 tenth 4 burns 35 tenth 4 burns 38 tenth 4 arthur 39 eleventh 4 burns 41 eleventh 4 burns 43 eleventh 4 burns 45 eleventh 4 arthur select WordsLimit(table{row{10 ID,'arthur' Word}}); ID Answer Wcnt Word ---------- ---------- ---------- ---------- <No Value> <No Value> <No Value> <No Value>
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Tuesday, July 17, 2007
Dataphor - Simplifying Relational Division
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment