Dataphor SQL RAC (Relational Application Companion)


A site of hope for those looking for a true relational database system

Tuesday, July 17, 2007

Dataphor - Simplifying Relational Division

If your not familiar with relational division a popular article
for the translation of the concept into sql is:

Relational Division
by Joe Celko

Basically relational division is the question of whether a set
of predicates exist in a given table. Generally the predicates
represent a set of 'rows' and therefore the question becomes
does a predicate in the form of a 'table' exist in another
table. The predicate rows are referred to as the divisor table,
the given table as the dividend table and the result as the 
quotient table. Hence the idea of division by tables. 

Since sql does not support a table 'type' various solutions
have been offered to target the rows of the given (dividend)
table. Most often the means of targeting rows are the use
of aggregate functions like count. Because sql cannot directly
compare one set of rows (ie. a table) to another set of rows (a table)
the forced simulation is not unlike the attempt to simulate
a list with the substring function as explained here. The resulting
sql solutions can at the very least be unintuitive and convoluted 
and at the very most not even possible to specify (at least for
most human beings . 

Given that Dataphor supports direct table comparisons the logic
of a relational division problem can be greatly simplified.

This example uses MS Sql Server 2005 and is based on the post:

Thursday, July 12, 2007 11:07 PM
microsoft.public.sqlserver.programming
Expert Challenge - Complex Join, Group By, Having

Here is some sample data. This is the dividend table.

create table CTable
{
 ID:Integer,
 Answer:String,
 Wcnt:Integer,
 Word:String,
 key{ID}
}; 
insert
table
{
row{1 ID,'first' Answer, 2 Wcnt,'george' Word},
row{2,'first', 2, 'burns'},
row{3,'second',2, 'burns'},
row{4,'second',2, 'burns'},
row{5,'third', 3,    'go'},
row{6,'third', 3,'george'},
row{7,'third', 3, 'burns'},
row{8,'fourth',2,  'fred'},   
row{9,'fourth',2,'george'}  
} into CTable;

Given a match (divisor) table:

(test #1)
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'george'

what the op refers to as a 'match' table, the relational division
question becomes find the rows in the dividend table that correspond
to the divisor rows. And it follows that we compare the Word and
ID columns of the tables. 

We now make two important observations:
1. The Answer column of the dividend table defines rows that belong
   together. These rows form a table.
2. Since we are interesting in comparing columns of two tables we note
   that it is unfeasible to try to compare the ID columns of the dividend
   and divisor tables since the ID column of the divisor table can take
   any value based on the Identity function. With a little insight we
   see that we can compare them if we first 'standardize' the ID column
   in both tables. We can do this by computing the 'rank' of Words in
   each table and compare the ranks instead of IDs. We also note that
   the way the original problem was posed the Word column is of primary
   significance. In other words, it is the presence of the divisor Words
   that determines a match. There is no significance attached to ID, we
   are simply standarizing it to include it in a comparison. Because it 
   has no meaning as to determining a match which ever way we standarize 
   as long as we are consistent within both tables it will be sufficient.

Sql solution   

Here is one of many sql solutions. The row_number() function is used
to standarize the ID columns in both tables and Rank is used in the
comparison. 

-- Match/divisor table (test#1).
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'george'

select C.ID,C.Answer,C.Wcnt,C.Word
from CTable as C
where C.Answer in
(
select B.Answer 
from
-- Note that if we order by 'Word desc' for the ranks in both selects it
-- will not make a difference, ie. (consistency for ranks between tables).
(select ID,Word,row_number()over(order by Word) Rank
     from @match as M) as A
right join
(select ID,Answer,Wcnt,Word,
    row_number()over(partition by Answer order by Word) Rank
     from CTable) as B
on A.Rank=B.Rank and A.Word=B.Word
group by B.Answer
having (count(A.rank)=count(*)) and (count(*)=(select count(*) from @match))
)

ID          Answer               Wcnt        Word                 
----------- -------------------- ----------- -------------------- 
1           first                2           george
2           first                2           burns

Note that the join is necessary but not sufficient. The rows of Answer
must be further restricted by aggregate comparisons using the count
aggregate. This methodology rests on being able to express a quotient
table using aggregate functions. But as the criteria for such a table
becomes increasingly more complex so does the query. It is interesting
that proponents of this methodology think it is quite helpful in 
understanding relation divison. For example the authors of:

A Simpler (and Better) SQL Approach to Relational Division

state 'We believe the syntactical construction of Q0 allows the student to
grasp the concepts of implementing SQL division in a more intuitive way'.
Perhaps most telling is their framing of the question of relational division
where they state:

'A common type of database query requires one to find all tuples of some table
 that are related to each and every one of the tuples of a second group.'
 
This seems to a reflection of the fact that there is no direct way to
compare tables in sql. What would appear much more to the point is simply:

'A common type of database query is to compare two tables'.

And this is precisely what we will do in the D4 language of Dataphor.

Developing a D4 solution

Keep in mind the very precise and simple definition of table equality:
Two tables are equal if they have the same named columns of the same 
data types and they have the same set of rows. In other words, they
are the same 'type' and have the same rows.

The following simple D4 query illustrates how easy relational division
can be. The query ignores ID and only compares tables based on Word.
It uses match test#1:

select
//Each row in CTable that has the particular Answer value given by the where
//predicate being true will be returned by the having operator.
CTable 
having
 ( 
//Every Answer value forms a (dividend) table. We get the Answer value 
//by comparing the match (divisor) table to each table formed by a 
//particular Answer.
 CTable {Answer AnswerI} 
//We compare two tables, each row in the match table must exist
//in the table formed by an Answer value.
  where 
   (table{row{'burns' Word},row{'george'}}) //match table.
   = 
  ((CTable where Answer=AnswerI with { ShouldSupport="false"}) {Word} )
   {AnswerI Answer }
 );   

ID Answer Wcnt Word   
-- ------ ---- ------ 
1  first  2    george 
2  first  2    burns  

The above query will work for matches #1 and #3 but will fail for #2.

select
 CTable 
  having
  ( 
   CTable {Answer AnswerI} 
    where 
  //Using test #2.
    (table{row{'burns' Word},row{'burns'}}) 
     = 
   ((CTable where Answer=AnswerI with { ShouldSupport="false"}) {Word} )
   {AnswerI Answer }
 );   
-- Internal Index Error: "Duplicate key violation."

Which makes the point that ID is necessary given duplicate Words.

Now lets get serious with how we're going to do this example in D4 .

First we're going to create a view in D4 using an sql pass-thru query with
row_number(). This will give us a rank (from 1 to Wcnt) in the direction of
Word for each Answer (the partition column).

create view CTableViewWord 
SQLQuery('select ID,Answer,Wcnt,Word,
    row_number()over(partition by Answer order by Word) Rank
     from CTable','key{Answer,Rank}') {ID,ToInteger(Rank) Rank,Answer,Wcnt,Word};

select 
 CTableViewWord 
        join
           (CTable group by {Answer} add{Min(ID) Asort})
             with {IgnoreUnsupported = 'true'}
             return 50 by {Asort,Rank}
                remove{Asort};

ID Rank Answer Wcnt Word   
-- ---- ------ ---- ------ 
2  1    first  2    burns  
1  2    first  2    george 
3  1    second 2    burns  
4  2    second 2    burns  
7  1    third  3    burns  
6  2    third  3    george 
5  3    third  3    go     
8  1    fourth 2    fred   
9  2    fourth 2    george 

D4 operator Words is the sql query expressed in a relational way. It too
standarizes ID by using a rank for comparison. Like the sql query it
only finds tables where there is the same occurrance of the Words. The
values of ID are not pertinent to the intent of the problem. The intent
is only to find the same Words ignoring the IDs. The operator takes a match
table as an argument and returns matching tables from CTable based on Answers.

//Matches with Word occurrence only.
create operator Words(MatchTable:table{ID:Integer,Word:String}):typeof(CTable)
begin      
result:=table of typeof(result){};  
//Create a table variable with a rank in the direction of Word. This is similar
//to sql row_number() function.               
var MatchTableRank:= 
              (ToTable(ToList(cursor(MatchTable order by {Word})))
                    {sequence+1 Rank,Word}) ;
var RTable:=
CTable
  having
        ( 
         (CTable {Answer AnswerI})
          where 
           MatchTableRank
            =
             (  
              (CTableViewWord where Answer=AnswerI)
               {Rank,Word} 
             ) with {IgnoreUnsupported = 'true'} 
              {AnswerI Answer} 
        );
//Result table will show nil values if RTable is empty.
if exists(RTable) then
result:=RTable
 else
  insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result;
end;

To clarify just what operator Words is doing lets insert some more data
into table CTable.

insert
table
{
row{10 ID,'fifth' Answer,3 Wcnt,'burns' Word}, 
row{11,'fifth', 3,   'burns'}, 
row{12,'fifth', 3,   'burns'}, 
row{13,'sixth',4,   'arthur'},  
row{14,'sixth',4,   'arthur'}, 
row{15,'sixth',4,    'burns'}, 
row{16,'sixth',4,    'burns'}, 
row{17,'seventh',4,  'burns'},  
row{18,'seventh',4,  'burns'}, 
row{19,'seventh',4, 'arthur'}, 
row{20,'seventh',4, 'arthur'}, 
row{21,'eight',4,   'arthur'},  
row{22,'eight',4,    'burns'}, 
row{23,'eight',4,   'arthur'}, 
row{24,'eight',4,    'burns'}, 
row{25,'nineth',4,   'burns'},  
row{26,'nineth',4,   'burns'}, 
row{27,'nineth',4,   'burns'}, 
row{28,'nineth',4,  'arthur'}, 
row{29,'tenth',4,    'burns'},  
row{32,'tenth',4,    'burns'}, 
row{35,'tenth',4,    'burns'}, 
row{38,'tenth',4,   'arthur'}, 
row{39,'eleventh',4, 'burns'},  
row{41,'eleventh',4, 'burns'}, 
row{43,'eleventh',4, 'burns'}, 
row{45,'eleventh',4,'arthur'} 
} into CTable;

Examples:

select Words(table{row{11 ID,'burns' Word},row{22,'burns'}});

ID Answer Wcnt Word  
-- ------ ---- ----- 
3  second 2    burns 
4  second 2    burns 

select Words(table{row{11 ID,'burns' Word},row{12,'burns'},row{17,'burns'}});

ID Answer Wcnt Word  
-- ------ ---- ----- 
10 fifth  3    burns 
11 fifth  3    burns 
12 fifth  3    burns 

The following three selects all return the same table since table
equality is based on the occurrence of the three Words. The value of 
ID and the sequence of Words in the match table is immaterial since the
comparison of Rank between the divisor (match) and dividend tables is
based on Word.

select Words(table{row{11 ID,'burns' Word},row{12,'george'},row{17,'go'}});
select Words(table{row{11 ID,'go' Word},row{12,'george'},row{17,'burns'}});
select Words(table{row{12 ID,'george' Word},row{13,'go'},row{14,'burns'}});

ID Answer Wcnt Word   
-- ------ ---- ------ 
5  third  3    go     
6  third  3    george 
7  third  3    burns  

These two selects return 3 Answers and conceptually 3 tables because the two
Words, 'arthur' and 'burns', occur twice within each Answer.

select Words(table{row{11 ID,'arthur' Word},row{17,'burns'},row{20,'arthur'},row{21,'burns'}});
select Words(table{row{11 ID,'burns' Word},row{17,'arthur'},row{20,'arthur'},row{21,'burns'}});

ID Answer  Wcnt Word   
-- ------- ---- ------ 
13 sixth   4    arthur 
14 sixth   4    arthur 
15 sixth   4    burns  
16 sixth   4    burns  
17 seventh 4    burns  
18 seventh 4    burns  
19 seventh 4    arthur 
20 seventh 4    arthur 
21 eight   4    arthur 
22 eight   4    burns  
23 eight   4    arthur 
24 eight   4    burns  

The idea of using the aggregate count in sql as a solution to relational
division problems is well suited to finding occurrences of things, ie. Words.
With D4 it is obviously not necessary to 'invent' an idea like count.
With table comparisons it is not necessary to go beyond the 'data' in the rows.

Up till now the ID value has not been meaningful. Lets change that. Here
we want to match the occurrence of Word and the ID column. Said another way,
we want to match the sequence of Words within an Answer.

We create another view where the rank is now based on ID.

create view CTableViewID 
SQLQuery('select ID,Answer,Wcnt,Word,
    row_number()over(partition by Answer order by ID) Rank
     from CTable','key{Answer,Rank}') {ID,ToInteger(Rank) Rank,Answer,Wcnt,Word};
     
select 
 CTableViewID 
        join
           (CTable group by {Answer} add{Min(ID) Asort})
             with {IgnoreUnsupported = 'true'}
              where Answer in ({'first','second','third','fourth','sixth','eight'})
               return 50 by {Asort,Rank}
                remove{Asort};
     
ID Rank Answer Wcnt Word   
-- ---- ------ ---- ------ 
1  1    first  2    george 
2  2    first  2    burns  
3  1    second 2    burns  
4  2    second 2    burns  
5  1    third  3    go     
6  2    third  3    george 
7  3    third  3    burns  
8  1    fourth 2    fred   
9  2    fourth 2    george 
13 1    sixth  4    arthur 
14 2    sixth  4    arthur 
15 3    sixth  4    burns  
16 4    sixth  4    burns  
21 1    eight  4    arthur 
22 2    eight  4    burns  
23 3    eight  4    arthur 
24 4    eight  4    burns  

Operator IDandWord is similar to operator Words but here Rank reflects
a meaningful sequence. Now the order of the match table is relevant to
the equality of the table comparison.

create operator IDandWord(MatchTable:table{ID:Integer,Word:String}):typeof(CTable)
begin      
result:=table of typeof(result){};  
//Create a table variable with a rank in the direction of ID. This is similar
//to sql row_number() function.               
var MatchTableRank:= 
              (ToTable(ToList(cursor(MatchTable order by {ID})))
                    {sequence+1 Rank,Word}) ;
var RTable:=
CTable
  having
        ( 
         (CTable {Answer AnswerI})
          where 
           MatchTableRank
            =
             (  
              (CTableViewID where Answer=AnswerI)
               {Rank,Word} 
             ) with {IgnoreUnsupported = 'true'} 
              {AnswerI Answer} 
        );
//Result table will show nil values if RTable is empty.
if exists(RTable) then
result:=RTable
 else
  insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result;
end;

Here the sequence of the ID/Word combination in the match table, when
standarized, is the same as the sequence in CTable so we have a match.

select IDandWord(table{row{11 ID,'go' Word},row{12,'george'},row{17,'burns'}});

ID Answer Wcnt Word   
-- ------ ---- ------ 
5  third  3    go     
6  third  3    george 
7  third  3    burns  

But these sequences of ID/Word do not match any sequence in CTable.

select IDandWord(table{row{11 ID,'burns' Word},row{12,'george'},row{17,'go'}});
select IDandWord(table{row{12 ID,'george' Word},row{13,'go'},row{14,'burns'}});

ID         Answer     Wcnt       Word       
---------- ---------- ---------- ---------- 
<No Value> <No Value> <No Value> <No Value> 

select IDandWord(table{row{11 ID,'arthur' Word},row{17,'burns'},row{20,'arthur'},row{21,'burns'}});

ID Answer Wcnt Word   
-- ------ ---- ------ 
21 eight  4    arthur 
22 eight  4    burns  
23 eight  4    arthur 
24 eight  4    burns  

No match:
select IDandWord(table{row{11 ID,'burns' Word},row{17,'arthur'},row{20,'arthur'},row{21,'burns'}});

ID         Answer     Wcnt       Word       
---------- ---------- ---------- ---------- 
<No Value> <No Value> <No Value> <No Value> 

Note that operators Words and IDandWord are the same except for the view
being used, the type of standarizing on the ID column. The sql solution
implies that aggregates would have to test for the sequence of IDs given
the Word values. This seems to mean that sql has to use a separate set of
constructs for each meaningful column involved in the match. And this
certainly raises the complexity level of such a query.

Now lets raise the bar one more notch in matching. Now not only the
sequence of Words is meaningful (Words and IDs) but so is the relationship
between the IDs. Given the three Answers that match the sequence of Words:

select 
 IDandWord(table{row{11 ID,'burns' Word},row{17,'burns'},row{20,'burns'},row{21,'arthur'}});

ID Answer   Wcnt Word   
-- -------- ---- ------ 
25 nineth   4    burns  
26 nineth   4    burns  
27 nineth   4    burns  
28 nineth   4    arthur 
29 tenth    4    burns  
32 tenth    4    burns  
35 tenth    4    burns  
38 tenth    4    arthur 
39 eleventh 4    burns  
41 eleventh 4    burns  
43 eleventh 4    burns  
45 eleventh 4    arthur 

We want to be able to distinguish (match) each of these Answer tables. To do this we
have to examine the relationship between ID values. The only way to target each
Answer is to test the difference between the IDs. So we have to test for differences
of 1,2 or 3.

Operator BetweenIDs is similar to operator IDandWord but adds a column of the
difference between consecutive IDs. The table comparison now involves looking
at three columns, Word, Rank and TestID.

create operator BetweenIDs(MatchTable:table{ID:Integer,Word:String}):typeof(CTable)
begin     
result:=table of typeof(result){};  
var MatchTableRank:=
        (ToTable(ToList(cursor(MatchTable order by {ID})))
          {ID,sequence+1 Rank,Word}) ;
//Compare the IDs (next to current) in sequence (ascending order) of IDs.
//An indexer ([]) expression is used to directly access a particular Rank.
var MatchTableID:=
           MatchTableRank          
            add{IfNil(((MatchTableRank adorn{key{Rank}})[Rank+1].ID-
              (MatchTableRank adorn{key{Rank}})[Rank].ID),0) TestID} 
                  {Rank,Word,TestID} ;       
var RTable:=
CTable
  having
        ( 
         (CTable {Answer AnswerI})
          where
           MatchTableID
            =
             (  
               (CTableViewID where Answer=AnswerI)
                   add{IfNil((CTableViewID[AnswerI,Rank+1].ID-
                                     CTableViewID[Answer,Rank].ID),0) TestID} 
                   {Rank,Word,TestID} 
             )  with {IgnoreUnsupported = 'true'} 
              {AnswerI Answer} 
        );
var Resultable:=table of typeof(CTable){}; 
//Result table will show nil values if RTable is empty.
if exists(RTable) then
result:=RTable
 else
  insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result;
end;

Match consecutivec IDs.
select BetweenIDs(table{row{10 ID,'burns' Word},row{11,'burns'},row{12,'burns'},row{13,'arthur'}});

ID Answer Wcnt Word   
-- ------ ---- ------ 
25 nineth 4    burns  
26 nineth 4    burns  
27 nineth 4    burns  
28 nineth 4    arthur 

Match IDs separated by 2.
select BetweenIDs(table{row{1 ID,'burns' Word},row{3,'burns'},row{5,'burns'},row{7,'arthur'}});

ID Answer   Wcnt Word   
-- -------- ---- ------ 
39 eleventh 4    burns  
41 eleventh 4    burns  
43 eleventh 4    burns  
45 eleventh 4    arthur 

Match IDs separated by 3.
select BetweenIDs(table{row{4 ID,'burns' Word},row{7,'burns'},row{10,'burns'},row{13,'arthur'}});

ID Answer Wcnt Word   
-- ------ ---- ------ 
29 tenth  4    burns  
32 tenth  4    burns  
35 tenth  4    burns  
38 tenth  4    arthur 

There is no match for these divisor tables:

select BetweenIDs(table{row{10 ID,'burns' Word},row{11,'burns'},row{12,'burns'},row{14,'arthur'}});
select BetweenIDs(table{row{2 ID,'burns' Word},row{3,'burns'},row{5,'burns'},row{7,'arthur'}});
select BetweenIDs(table{row{14 ID,'burns' Word},row{11,'burns'},row{8,'burns'},row{5,'arthur'}});

ID         Answer     Wcnt       Word       
---------- ---------- ---------- ---------- 
<No Value> <No Value> <No Value> <No Value> 


Hopefully you can see how D4 can take the mystery out of relational division.

bye for now,
steve

For those who like to explore D4 here is a bonus operator . Operator
WordsLimit is similar to operator Words but limits the possible Answer values
used for comparing the divisor and dividend tables.

//This operator limits possible Answers for the table comparison.
create operator WordsLimit(MatchTable:table{ID:Integer,Word:String}):typeof(CTable)
begin     
result:=table of typeof(result){}; 
//Get the Word/ID  combination corresponding to the last Rank (when ording by Word)
//for each Answer.
var LastRow:=
    CTableViewWord 
     having 
       (
        CTableViewWord 
        group by {Answer} add{Max(Rank) Rank}
       )   
        {ID,Answer,Wcnt,Word,Rank};
var MatchTableRank:= 
              (ToTable(ToList(cursor(MatchTable order by {Word})))
                    {sequence+1 Rank,Word}) ;
//Get Last row of Match table                    
var LRow:=
        ((MatchTableRank adorn{key{Rank}}) return 1 by {Rank desc})[];
//PTable are Answer(s) from CTable where the last row of an Answer is also
//the last row of the MatchTable, ie the Rank is the same and the Word is
//the same as the last row in the MatchTable. It returns only those Answer(s)
//that 'could' be true in a table comparison using all the rows of MatchTable
//and CTable. It limits the possible Answer(s) to only those that could be
//true.
var PTable:=
CTable {Answer}
where
IsNotNil
(
(LastRow adorn{key{Answer,Rank,Word}})[Answer,LRow.Rank,LRow.Word by{Answer,Rank,Word}]
) with {IgnoreUnsupported = 'true'} ;
//Now using PTable which reduces number of table comparisons.
//Note that the entire 'having' relation is derived independently of CTable.
//It must be derived first so the rows of CTable can be compared to the 'having'
//relation.
var RTable:=
     CTable
      having
       ( 
        (PTable {Answer AnswerI})
          where 
           MatchTableRank
             =
            ( 
             (CTableViewWord where Answer=AnswerI)
              {Rank,Word} 
            ) 
              {AnswerI Answer} 
        ) with {IgnoreUnsupported = 'true'} ;
var Resultable:=table of typeof(CTable){}; 
//Result table will show nil values if RTable is empty.
if exists(RTable) then
result:=RTable
 else
  insert row{nil ID,nil Answer,nil Wcnt,nil Word} into result;
end;

select WordsLimit(table{row{10 ID,'burns' Word},row{13,'burns'},row{16,'burns'},row{19,'arthur'}});

ID Answer   Wcnt Word   
-- -------- ---- ------ 
25 nineth   4    burns  
26 nineth   4    burns  
27 nineth   4    burns  
28 nineth   4    arthur 
29 tenth    4    burns  
32 tenth    4    burns  
35 tenth    4    burns  
38 tenth    4    arthur 
39 eleventh 4    burns  
41 eleventh 4    burns  
43 eleventh 4    burns  
45 eleventh 4    arthur 

select WordsLimit(table{row{10 ID,'arthur' Word}});

ID         Answer     Wcnt       Word       
---------- ---------- ---------- ---------- 
<No Value> <No Value> <No Value> <No Value> 

Friday, July 13, 2007

Dataphor - An example of rapid application development

The SqlWindow example involves populating tables thru a form that is derived
by the Dataphor frontend server. The fact that the form is derived by the
system services is a key concept in Dataphors ability to be a vehicle of
rapid application development. The example covers a wide range of Dataphor
capabilities that are used in application development. Concepts covered include
the presentation layer, metadata, forms, event handlers, references, constraints,
cursors and views as well as many other programming aspects of D4. 

What does it do
Based on user input several tables are populated and can be viewed thru
a form. Complete editing is available. The example uses Sql Server 2005
as the data respository (device). 

Requirements 
The application was developed using Dataphor v2677 and MS Sql Server 2005 sp1.
Except for an sql stored procedure (which is not required) the example should
run on any version of Sql Server or any other database for that matter.

Download the demo
Everything needed to run the demo including screenshots and setup scripts for
tables, views, constraints, references, operators etc. can be download @
http://www.geocities.com/d4tosql/D4/sqlwindowexample.zip

All scripts for Dataphor objects are documented with comments that should
help explain the what and why of the code. 

Screenshots
You can view screenshots of the demo here and here.
View a D4 operator used here.

What is the context of the example
Two different methods are used to compute cumulative aggregates that are based 
on the concept of an sql window as outlined in the Sql-99 standard. The sql window
is intended to simplify and make more efficient the computing of cumulative sums. 
While an sql window is available in DB2 and Oracle, it is not in Sql Server. 
The two methods used in the example simulate two different conceptual approaches
to deriving cumulative/running sums.

Who is this intended for
For anyone interested in exploring in greater detail key programming aspects
of Dataphor used in application development. There are many operators and
other Dataphor objects to go thru. The code is relatively straightforward
and easy to follow. The example can also be used as a demo for those not
familiar with Dataphor. The setup script will do everything required to run
the example. It is only necessary to have a library using Sql Server as the
device. The form in this example is 'completely' derived by the frontend server
using metadata supplied in various places. Even without a customized form
entering appropriate data is very easy and straightforward. If an entry is
invalid and violates a constraint the custom error message(s) fully describe
how to correct the entry. A modified form with menus will be available in the 
future.

Sql background for computing cumulative/running sums
Traditionally computing a running sum involved a subquery with an inequality
comparision. Given the sample table Stocks, the following sql query will
compute running sums for each Stock in ascending order of QTime. The query
can be run from Dataphor using the SQLQuery operator to pass it directly
to Sql Server. The use of the row_number() function allows the ranks to
be easily compared.

select SQLQuery
 ('
   select A.Stock,A.PRank,A.QTime,A.Quote,
    (select Sum(B.Quote) 
      from 
       (select Stock,QTime,Quote,
         row_number()over(Partition by Stock Order by QTime,Quote) as PRank
          from Stocks) as B
           where B.Stock=A.Stock and B.PRank<=A.PRank) as YSum 
   from
   (select Stock,QTime,Quote,
      row_number()over(Partition by Stock Order by QTime,Quote) as PRank
        from Stocks) as A    
 ');
Stock    PRank QTime                  Quote YSum 
-------- ----- ---------------------- ----- ---- 
IBM      1     4/3/2006 2:47:00 PM    107   107  
IBM      2     7/16/2006 7:39:00 AM   157   264  
IBM      3     8/26/2006 6:24:00 PM   125   389  
IBM      4     9/8/2006 1:11:00 PM    171   560  
IBM      5     11/22/2006 12:01:00 PM 155   715  
IBM      6     2/2/2007 12:03:00 AM   171   886  
IBM      7     4/13/2007 3:30:00 AM   219   1105 
IBM      8     10/18/2007 12:12:00 AM 133   1238 
IBM      9     12/25/2007 10:48:00 AM 139   1377 
MS       1     3/17/2006 7:12:00 PM   257   257  
.

This method is similar to computing ranks as discussed here and is also
just as inefficient. The impact of the subquery is more compelling when
it is expressed as a table. 
The following query gives the counts for the stocks:

select Stocks group by {Stock} add{Count() Cnt};

Stock    Cnt 
-------- --- 
IBM      9   
MS       11  
MySQL_AB 12  
Oracle   11  
Sybase   9   

For a table to represent the subquery it must be able to accommodate the
maximum number of rows in a Stock. Based on MySQL_AB, we need at least
12 rows. Consider the following D4 batch which populates a table assuming
the current row is the 6th row (PRank=6) in the 'IBM' Stock partition.
The table represents all rows up to and including the 6th row. Non-existent
rows, thru the use of IfNil, are given a PRank and nil (null) value for
Quote. A sum of the Quote values from the table would represent the
running sum up thru the 6th rank (PRank) for the 'IBM' Stock.

//Shows table based on table var S for a particular Stock and PRank.
var S:=SQLQuery('select Stock,QTime,Quote,
    row_number()over(Partition by Stock Order by QTime,Quote) as PRank
          from Stocks') {Stock,PRank,Quote} ;     
var Stock:='IBM';
var PRank:=6;          
select
      table
           {
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}),
           (S adorn{key{Stock,PRank}})[Stock,PRank]
           } order by {PRank desc};

Stock      PRank Quote      
---------- ----- ---------- 
IBM        6     171        
IBM        5     155        
IBM        4     171        
IBM        3     125        
IBM        2     157        
IBM        1     107        
<No Value> -6    <No Value> 
<No Value> -7    <No Value> 
<No Value> -8    <No Value> 
<No Value> -9    <No Value> 
<No Value> -10   <No Value> 
<No Value> -11   <No Value> 

The fact that the table has to be repopulated for each rank underscores the
inefficiency of the subquery. There is no reuse of information from one
rank to the next. When we compute the running sum for 'IBM':

var S:=SQLQuery('select Stock,QTime,Quote,
    row_number()over(Partition by Stock Order by QTime,Quote) as PRank
          from Stocks') {Stock,PRank,Quote} ;     
select
     S 
       add
          {
Sum
    (
     Quote
      from
     (
      table
           {
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}),
           (S adorn{key{Stock,PRank}})[Stock,PRank]
           }
     )
    )
    SumQte
          } ;       

Stock    PRank Quote SumQte 
-------- ----- ----- ------ 
IBM      1     107   107    
IBM      2     157   264    
IBM      3     125   389    
IBM      4     171   560    
IBM      5     155   715    
IBM      6     171   886    
IBM      7     219   1105   
IBM      8     133   1238   
IBM      9     139   1377   
MS       1     257   257    
.

We are making 45 inserts into the table since we are starting over from the
1st rank for each row of 'IBM':
1+2+3+4+5+6+7+8+9=45 inserts.

In the example operators D4ApplyforForm (used on the before insert event)
and D4ApplyforFormUpdate (used on the before update event) represent the
the idea of having to build a solution independently for each row. It therefore
represents an inefficient solution just as a subquery does.

Obviously it would make a lot more sense to simply add each new row to the
table which had all the previous rows and then get the sum. We could represent
the addition of a single row to the table as follows:

//Shows reusing information from one PRank to another.
var S:=SQLQuery('select Stock,QTime,Quote,
    row_number()over(Partition by Stock Order by QTime,Quote) as PRank
          from Stocks') {Stock,PRank,Quote} ;     
var Stock:='IBM';
var PRank:=6;          
var T1:=
      table
           {
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}),
           (S adorn{key{Stock,PRank}})[Stock,PRank]
           } ;
select
   T1
   union
   table
       {
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank+1],row{-99 PRank,nil Stock,nil Quote})
     } 
     where IsNotNil(Quote)
         order by {PRank desc};

Stock PRank Quote 
----- ----- ----- 
IBM   7     219   
IBM   6     171   
IBM   5     155   
IBM   4     171   
IBM   3     125   
IBM   2     157   
IBM   1     107   
 
In this way there would be as many inserts into the table as rows for the
Stock. In other words we would be scanning the data just once for the nine
rows vs. the multiple scans for the subquery (table).

We can show the idea of accumulating sums incrementally. For example the
running sum for row 6, adding the next row to the previously stored table
and then getting the sum for row 7:

var S:=SQLQuery('select Stock,QTime,Quote,
    row_number()over(Partition by Stock Order by QTime,Quote) as PRank
          from Stocks') {Stock,PRank,Quote} ;     
var Stock:='IBM';
var PRank:=6;          
var T1:=
      table
           {
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-11],row{-11 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-10],row{-10 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-9], row{ -9 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-8], row{ -8 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-7], row{ -7 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-6], row{ -6 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-5], row{ -5 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-4], row{ -4 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-3], row{ -3 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-2], row{ -2 PRank,nil Stock,nil Quote}),
     IfNil((S adorn{key{Stock,PRank}})[Stock,PRank-1], row{ -1 PRank,nil Stock,nil Quote}),
           (S adorn{key{Stock,PRank}})[Stock,PRank]
           } ;
select 
 table
     {
      row{'Accumulation of Sum for row 6' Desc,Sum(Quote from T1) SumQte},
      row{'Accumulation of Sum for row 7',
Sum(Quote
      from
         (
          T1
          union
          table
          {
         IfNil((S adorn{key{Stock,PRank}})[Stock,PRank+1],row{-99 PRank,nil Stock,nil Quote})
        } 
         where IsNotNil(Quote)
        )) }
     };     
        
Desc                          SumQte 
----------------------------- ------ 
Accumulation of Sum for row 6 886    
Accumulation of Sum for row 7 1105             
          
In the example operators D4WindowforForm (used on the before insert event)
and D4WindowforFormUpdate (used on the before update event) represent the
the idea of reusing prior information when appropriate by incrementally
adding (and when appropriate deleting) rows from a work table that is used
to obtain cumulative aggregates. View operator D4WindowforForm here.

The Sql window
The idea of accumulating running sums based on a new construct was introduced
in the sql-99 standard:

ISO/ANSI: Introduction to OLAP functions
http://tinyurl.com/2taahc

'An <OLAP function> is defined using a window. A window may specify a partitioning,
 an ordering of rows within partitions, and an aggregation group. The aggregation
 group specifies which rows of a partition, relative to the current row, should
 participate in the calculation of an aggregate. Through aggregation groups, windows
 support such important OLAP capabilities as cumulative sums and moving averages. 
 Windows may be specified either in the new WINDOW clause, or in-line in the SELECT list.'

The running sums for Stock is expressed by:

SELECT Stock,QTime,Quote,
   Sum(Quote) OVER (PARTITION BY Stock ORDER BY QTime,Quote 
                    'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW') AS SumQte
   FROM Stocks;       

where the window is:

 Sum(Quote) OVER (PARTITION BY Stock ORDER BY QTime,Quote 
                 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW') AS SumQte

The 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW' is called the aggregate
group clause/statement. The definition of the terms is as follows:

window-aggregation-group-clause 
The aggregation group of a row R is a set of rows, defined relative to R in
the ordering of the rows of R's partition. This clause specifies the aggregation group. 

ROWS 
Indicates the aggregation group is defined by counting rows. 

group-BETWEEN 
Specifies the aggregation group start and end based on ROWS. 

UNBOUNDED PRECEDING 
Includes the entire partition preceding the current row. 

UNBOUNDED FOLLOWING 
Includes the entire partition following the current row. 

CURRENT ROW 
Specifies the start or end of the aggregation group as the current row. 

PRECEDING 
Specifies the number of rows preceding the current row.as a positive integer
indicating a number of rows. 

FOLLOWING 
Specifies the number of rows following the current row. as a positive integer
indicating a number of rows. 

The definition 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW' or its
abbreviated form 'ROWS UNBOUNDED PRECEDING' means include all the rows in
the partition prior to and including the current row in the computation of
the sum.

The sql-99 window does not introduce anything that couldn't be done previously.
What it does is simplify the way it is done and most importantly make accumulation
much more efficient.

We can express the idea of the sql window in Sql Server with the following
procedure:

create procedure SqlWindow
@From INT,
@To INT
as
SELECT A.PRank,A.Stock,A.QTime,A.Quote,
       WCnt,SumQte,MinQte,MaxQte,AvgQte
 FROM 
  (SELECT Stock,QTime,Quote,
   ROW_NUMBER() OVER(PARTITION BY Stock ORDER BY QTime) AS PRank
   FROM Stocks) AS A
    CROSS APPLY
     (
      SELECT COUNT(*) AS WCnt,SUM(Quote) AS SumQte,MIN(Quote) AS MinQte,
      MAX(Quote) AS MaxQte,CAST(AVG(1.*Quote) AS DECIMAL(6,1)) AS AvgQte
      FROM
       (SELECT Stock,Quote,ROW_NUMBER() OVER(ORDER BY QTime) AS PRank
        FROM Stocks AS B
        WHERE B.Stock=A.Stock) AS C
         WHERE C.PRank BETWEEN A.PRank+@From AND A.PRank+@To
       ) AS D

The @From and @To parameters are the bound of rows relative to the current
row that the running/cumulative sum is obtained for.

To obtain the running sums (and other aggregates) we can execute the procedure
using an aggregate group definition that includes all prior rows in the sum:

select 
SQLQuery
(
'Exec SqlWindow @From,@To',
 WList('ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW') {Wfrom From,Wto To}
);

PRank Stock    QTime                  Quote WCnt SumQte MinQte MaxQte AvgQte 
----- -------- ---------------------- ----- ---- ------ ------ ------ ------ 
1     IBM      4/3/2006 2:47:00 PM    107   1    107    107    107    107.0  
2     IBM      7/16/2006 7:39:00 AM   157   2    264    107    157    132.0  
3     IBM      8/26/2006 6:24:00 PM   125   3    389    107    157    129.7  
4     IBM      9/8/2006 1:11:00 PM    171   4    560    107    171    140.0  
5     IBM      11/22/2006 12:01:00 PM 155   5    715    107    171    143.0  
6     IBM      2/2/2007 12:03:00 AM   171   6    886    107    171    147.7  
7     IBM      4/13/2007 3:30:00 AM   219   7    1105   107    219    157.9  
8     IBM      10/18/2007 12:12:00 AM 133   8    1238   107    219    154.8  
9     IBM      12/25/2007 10:48:00 AM 139   9    1377   107    219    153.0  
1     MS       3/17/2006 7:12:00 PM   257   1    257    257    257    257.0  
.
.

The D4 operator WList parses the aggregate definition to obtain the @From and
@To values.

Executing the WList operator with the definition we obtain the bounds of
the cumulative sum. WList returns a row with the From and To values:

select WList('ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW');

Wfrom       Wto 
----------- --- 
-1000000000 0 

The Wfrom value of -1000000000 simply represents a large negative number
guaranteed to capture all prior rows. The value of 0 for Wto represents the
current row. The specify construct '{Wfrom From,Wto To}' simply renames the
columns of the row returned by operator WList to the names of the parameters
expected by the sql procedure SqlWindow.

If we wanted cumulative sums from the current row thru all rows available
for a Stock we can use either of the following:

select WList('ROWS UNBOUNDED FOLLOWING'); 
select WList('ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING');  

Wfrom Wto        
----- ---------- 
0     1000000000 

The Wfrom value of 0 is the current row and the Wto value of 1000000000
guarantees we will include all available 'following' rows in the partition
for the cumulative sum.

Executing the SqlWindow procedure:

select 
SQLQuery
(
'Exec SqlWindow @From,@To',
 WList('ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING') {Wfrom From,Wto To}
);

PRank Stock    QTime                  Quote WCnt SumQte MinQte MaxQte AvgQte 
----- -------- ---------------------- ----- ---- ------ ------ ------ ------ 
1     IBM      4/3/2006 2:47:00 PM    107   9    1377   107    219    153.0  
2     IBM      7/16/2006 7:39:00 AM   157   8    1270   125    219    158.8  
3     IBM      8/26/2006 6:24:00 PM   125   7    1113   125    219    159.0  
4     IBM      9/8/2006 1:11:00 PM    171   6    988    133    219    164.7  
5     IBM      11/22/2006 12:01:00 PM 155   5    817    133    219    163.4  
6     IBM      2/2/2007 12:03:00 AM   171   4    662    133    219    165.5  
7     IBM      4/13/2007 3:30:00 AM   219   3    491    133    219    163.7  
8     IBM      10/18/2007 12:12:00 AM 133   2    272    133    139    136.0  
9     IBM      12/25/2007 10:48:00 AM 139   1    139    139    139    139.0 <- Last row
1     MS       3/17/2006 7:12:00 PM   257   11   2254   100    290    204.9  
.
.
You can see that it is the opposite of 'ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW'.
For the last row in the partition we are left with only the current row being
used for the cumulative sum.

Suppose we want sums for the current row, 2 prior rows and 1 following row.
The aggregate group definition would be:

select WList('ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING');  

Wfrom Wto 
----- --- 
-2    1   

This definition therefore can use 4 rows, the 2 prior rows from the current row,
the current row and the next row.

select 
SQLQuery
(
'Exec SqlWindow @From,@To',
 WList('ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING') {Wfrom From,Wto To}
);

PRank Stock    QTime                  Quote WCnt SumQte MinQte MaxQte AvgQte 
----- -------- ---------------------- ----- ---- ------ ------ ------ ------ 
1     IBM      4/3/2006 2:47:00 PM    107   2    264    107    157    132.0 <- First row
2     IBM      7/16/2006 7:39:00 AM   157   3    389    107    157    129.7  
3     IBM      8/26/2006 6:24:00 PM   125   4    560    107    171    140.0  
4     IBM      9/8/2006 1:11:00 PM    171   4    608    125    171    152.0  
5     IBM      11/22/2006 12:01:00 PM 155   4    622    125    171    155.5  
6     IBM      2/2/2007 12:03:00 AM   171   4    716    155    219    179.0  
7     IBM      4/13/2007 3:30:00 AM   219   4    678    133    219    169.5  
8     IBM      10/18/2007 12:12:00 AM 133   4    662    133    219    165.5  
9     IBM      12/25/2007 10:48:00 AM 139   3    491    133    219    163.7  
1     MS       3/17/2006 7:12:00 PM   257   2    424    167    257    212.0  
.
.

Note that not all rows have 4 values. For the 1st row (PRank=1) there are no
prior rows, there is only the current row and the one following. Therefore the
aggregates for the 1st row are based on cumulative values from only 2 rows.

Note that a cumulative sum need not include the current row. We can base the
accumulation on prior or following rows. For example we can use the prior
rows between 4 and 2:

select WList('ROWS BETWEEN 4 PRECEDING AND 2 PRECEDING')       

Wfrom Wto 
----- --- 
-4    -2  

select 
SQLQuery
(
'Exec SqlWindow @From,@To',
 WList('ROWS BETWEEN 4 PRECEDING AND 2 PRECEDING') {Wfrom From,Wto To}
);          

PRank Stock    QTime                  Quote WCnt SumQte     MinQte     MaxQte     AvgQte     
----- -------- ---------------------- ----- ---- ---------- ---------- ---------- ---------- 
1     IBM      4/3/2006 2:47:00 PM    107   0    <No Value> <No Value> <No Value> <No Value> 
2     IBM      7/16/2006 7:39:00 AM   157   0    <No Value> <No Value> <No Value> <No Value> 
3     IBM      8/26/2006 6:24:00 PM   125   1    107        107        107        107.0      <-First value
4     IBM      9/8/2006 1:11:00 PM    171   2    264        107        157        132.0      
5     IBM      11/22/2006 12:01:00 PM 155   3    389        107        157        129.7      
6     IBM      2/2/2007 12:03:00 AM   171   3    453        125        171        151.0      
7     IBM      4/13/2007 3:30:00 AM   219   3    451        125        171        150.3      
8     IBM      10/18/2007 12:12:00 AM 133   3    497        155        171        165.7      
9     IBM      12/25/2007 10:48:00 AM 139   3    545        155        219        181.7      
1     MS       3/17/2006 7:12:00 PM   257   0    <No Value> <No Value> <No Value> <No Value> 
2     MS       5/9/2006 3:12:00 PM    167   0    <No Value> <No Value> <No Value> <No Value> 
3     MS       6/24/2006 1:51:00 AM   277   1    257        257        257        257.0      
.
.

It's not until the 3rd row of a partition that a row in included.

We can lead or lag by using the same value from both From and To. For example
to lead by 1 row we can use:

select 
SQLQuery
(
'Exec SqlWindow @From,@To',
 WList('ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING') {Wfrom From,Wto To}
);          

PRank Stock    QTime                  Quote WCnt SumQte     MinQte     MaxQte     AvgQte     
----- -------- ---------------------- ----- ---- ---------- ---------- ---------- ---------- 
1     IBM      4/3/2006 2:47:00 PM    107   1    157        157        157        157.0      
2     IBM      7/16/2006 7:39:00 AM   157   1    125        125        125        125.0      
3     IBM      8/26/2006 6:24:00 PM   125   1    171        171        171        171.0      
4     IBM      9/8/2006 1:11:00 PM    171   1    155        155        155        155.0      
5     IBM      11/22/2006 12:01:00 PM 155   1    171        171        171        171.0      
6     IBM      2/2/2007 12:03:00 AM   171   1    219        219        219        219.0      
7     IBM      4/13/2007 3:30:00 AM   219   1    133        133        133        133.0      
8     IBM      10/18/2007 12:12:00 AM 133   1    139        139        139        139.0      
9     IBM      12/25/2007 10:48:00 AM 139   0    <No Value> <No Value> <No Value> <No Value> <-Last row
1     MS       3/17/2006 7:12:00 PM   257   1    167        167        167        167.0      
2     MS       5/9/2006 3:12:00 PM    167   1    277        277        277        277.0      
3     MS       6/24/2006 1:51:00 AM   277   1    250        250        250        250.0      
4     MS       8/18/2006 6:46:00 PM   250   1    290        290        290        290.0      
5     MS       1/16/2007 5:56:00 PM   290   1    177        177        177        177.0      
6     MS       1/22/2007 1:28:00 AM   177   1    150        150        150        150.0      
7     MS       5/30/2007 4:30:00 PM   150   1    263        263        263        263.0      
8     MS       6/21/2007 1:54:00 PM   263   1    153        153        153        153.0      
9     MS       7/17/2007 12:20:00 PM  153   1    170        170        170        170.0      
10    MS       11/26/2007 1:45:00 AM  170   1    100        100        100        100.0      
11    MS       12/25/2007 3:54:00 PM  100   0    <No Value> <No Value> <No Value> <No Value> <-Last row
1     MySQL_AB 2/8/2006 3:47:00 PM    21    1    25         25         25         25.0       
.
.

The last row for each partition (Stock) has no data in the accumulation. Note that
the lead precludes the use of previous information.

Modeling the Sql Window
The D4 operators that compute cumulative aggregates, (D4ApplyforForm, D4ApplyforFormUpdate,
D4WindowforForm and D4WindowforFormUpdate) take the sql procedure SqlWindow
as their logical model. The operators implement the logic of the procedure using
cursors as a way to illustrate their use. As stated previously the D4Apply operators
follow the sql procedure more faithfully and reflect the same inefficiency
as the procedure. The D4Window operators follow the same basic logic but make use
of previous rows. This interdependency reflects a much more efficient methodology
and hence is more faithful to the idea of the efficient sql window.

Thursday, July 05, 2007

Dataphor - Create string intervals

This problem appeared on the sqlservercentral.com forums:
'Compare two varchar arrays one character at time until different'
http://tinyurl.com/yochaa

The original poster explains:
Some background info first. We have an ASP.NET Web Application that serves up a list of
names in a dynamic tree control. The nodes of the tree are generated on demand and loaded
into the tree dynamically. When this list of names become very big (and indeed it will
be), the delay takes too long. So to minimize the delay we will group the names into
dynamically created folders. Each folder will contain n number of names based on some
passed in limit (In my code below this is the variable @NthRow). This way we can mimimize
the number of nodes to fetch yet still support a large number of names.

The final output of our function must be a table of strings in the form of X - Y where X
is the starting name and Y is the ending name of the group.

For example, given the following names and @NthRow = 5

Baker, Ms. Jamie
Espinosa, Ms. Jean
Gonzales, Mr. Robert
Holland, Ms. Julia
Jimenez, Ms. Soni
Macdonald, Mr. Mickey
Miller, Ms. Jana
Noriega, Ms. Michelle
Owen, Ms. Alma
Ramirez, Ms. Stephanie
Salas, Ms. Jeanine
Sipes, Mr. Tyler
Tamayo, Ms. Laura
Timmer, Ms. Julie
Trevino, Ms. Yvonne
Young, Ms. Annia

Produce an output table of:
Ba-Ji
Ma-Ra
Sa-Tr
Y-Y

Where:
Ba-Ji represents the names Baker, Ms. Jamie thru Jimenez, Ms. Soni
Ma-Ra represents the names Macdonald, Mr. Mickey thru Ramirez, Ms. Stephanie
Sa-Tr represents the names Salas, Ms. Jeanine thru Trevino, Ms. Yvonne
Y-Y represents the name Young, Ms. Annia
------------------------------------------------------

This solution uses the D4 language of Dataphor and MS Sql Server 2005 as
the data repository,

Basically we want to take a table of names ordered by last name and divide it
into intervals of NthRow names. We want at least the 1st 2 characters of
of the starting and ending last names for each interval. We will allow that
if the last interval contains only 1 name we want to show a lesser number
of characters than other intervals.

Here is some sample data entered in Dataphor and stored in sql server:

create table MyNames
{
ID:Integer,
Name:String tags{Storage.Length = "50"},
key{ID}
};
insert
table
{
row{1 ID,' Owen , Ms. Alma' Name},
row{2,'Ramirez, Ms. Stephanie'},
row{3,' Salas, Ms.  Jeanine'},
row{4,'  Sipes, Mr. Tyler'},
row{5,'Tamayo, Ms. Laura'},
row{6,'Timmer  , Ms. Julie'},
row{7,'  Trevino, Ms. Yvonne'},
row{8,'  Young, Ms. Annia'},
row{9,'Baker, Ms. Jamie'},
row{10,' Espinosa, Ms. Jean'},
row{11,'  Gonzales, Mr. Robert'},
row{12,'  Holland, Ms. Julia'},
row{13,'Jimenez, Ms. Soni'},
row{14,' Macdonald, Mr. Mickey'},
row{15,'   Miller, Ms. Jana'},
row{16,'Noriega, Ms. Michelle'}
}
into MyNames;    

Operator ArrayVarNames takes parameters of a 'table' with columns ID and Name,
NthRow and NChars which is the number of characters +1 to identity a name.
The operators returns a table with the interval information, the count of
names in the interval and the identifying characters.

create operator
    ArrayVarNames(MyNames:table{ID:Integer,Name:String},NthRow:Integer,NChars:Integer):
                  table{Index:Integer,NFrom:Integer,NTo:Integer,NCnt:Integer,
                        CharFrom:String,CharTo:String}
begin
result:=table of typeof(result){};
//Return a table variable with a sequence number (RowID) reflecting Lastname
//sorted in ascending order.
var T:=
     ToTable(ToList(cursor(
       (MyNames add{Name.Split()[0].Trim() LastName}
          with {IgnoreUnsupported = 'true'}) order by {LastName})))
            {sequence+1 RowID,Name,LastName};
var MaxID:=Max(RowID from T); //Max RowID in table T.           
//NGroups is a table variable that uses NthRow and the numbers table to form
//intervals. For each interval (Index) we get the count (Cnt), the max RowID
//(MaxRowID) for the interval (we really are only interested in MaxRowID
//for the last interval).
var NGroups:=
            (
               numbers
                where num between 1 and MaxID
                 {num Index,((num-1)*NthRow)+1 From,num*NthRow To}
               )
                  times
                    T with {IgnoreUnsupported = 'true'}
                      where RowID between From and To
                       group by {Index} add{Count() Cnt,Max(RowID) MaxRowID}
                          {Index,Cnt,MaxRowID};
//LRow is a row variable with the max Index (MaxIndex), which is the max interval
//given data MyNames and NthRow, the count (Cnt) of the rows for the max Index
//and the max RowID for MyNames. Note that LRow uses the table (variable) NGroups.                        
var LRow:=    
           ( NGroups adorn{key{Index}} return 1 by {Index desc}
                {Index MaxIndex,MaxRowID,Cnt})[];
//We get NChars+1 characters of the Lastname for the starting (NFrom) RowID
//and ending RowID (NTo) for each interval by using RowID as a key to look
//up LastName in table T. We use some boolean logic to check if there is
//only 1 LastName in the last interval (MaxIndex). If true we only want
//the NChars character of LastName. We put the boolean logic directly in SubString.
//We also add the count (NCnt) for each interval.                       
result:=
numbers rename{num Index}
where Index between 1 and LRow.MaxIndex
 add{(NGroups adorn{key{Index}})[Index].Cnt NCnt} with {IgnoreUnsupported = 'true'}
 add{((Index-1)*NthRow)+1 NFrom,Index*NthRow NTo}
  add
     {
      SubString((T adorn{key{RowID}})[NFrom].LastName,0,
      (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))) CharFrom,
   IfNil(
        SubString((T adorn{key{RowID}})[NTo].LastName,0,
      (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))),
      SubString((T adorn{key{RowID}})[LRow.MaxRowID].LastName,0,
      (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1)))))
        ) CharTo
     }
     {Index,NFrom,NTo,NCnt,CharFrom,CharTo}
     with {IgnoreUnsupported = 'true'} ;
end;    

For example given the original data:

select ArrayVarNames(MyNames,5,1);

Index NFrom NTo NCnt CharFrom CharTo
----- ----- --- ---- -------- ------
1     1     5   5    Ba       Ji   
2     6     10  5    Ma       Ra   
3     11    15  5    Sa       Tr   
4     16    20  1    Y        Y  

select ArrayVarNames(MyNames,6,2);

Index NFrom NTo NCnt CharFrom CharTo
----- ----- --- ---- -------- ------
1     1     6   6    Bak      Mac  
2     7     12  6    Mil      Sip  
3     13    18  4    Tam      You  

We can use the operator with any table by simply renaming the columns of
the table. Here we use the Orders table of the Sql Server Northwind database
to create intervals for customers by simply renaming the columns of the
Orders table to agree with those expected by ArrayVarNames:

select ArrayVarNames
 (Orders group by {CustomerID} add{Max(OrderID) ID}
            {ID,CustomerID Name},15,4);


Index NFrom NTo NCnt CharFrom CharTo
----- ----- --- ---- -------- ------
1     1     15  15   ALFKI    COMMI
2     16    30  15   CONSH    GOURL
3     31    45  15   GREAL    LILAS
4     46    60  15   LINOD    QUEEN
5     61    75  15   QUICK    THEBI
6     76    90  14   THECR    WOLZA

We can create a similar table using the Products table of the Northwind db
using ProductID and ProductName.

select ArrayVarNames
 (Products {ProductID ID,ProductName Name},10,30);
 
Index NFrom NTo NCnt CharFrom                      CharTo                          
----- ----- --- ---- ----------------------------- ------------------------------- 
1     1     10  10   Alice Mutton                  Chef Anton's Gumbo Mix          
2     11    20  10   Chocolade                     Grandma's Boysenberry Spread    
3     21    30  10   Gravad lax                    Jack's New England Clam Chowder 
4     31    40  10   Konbu                         Mishi Kobe Niku                 
5     41    50  10   Mozzarella di Giovanni        Queso Cabrales                  
6     51    60  10   Queso Manchego La Pastora     Scottish Longbreads             
7     61    70  10   Singaporean Hokkien Fried Mee Tofu                            
8     71    80  7    Tourtière                     Zaanse koeken                   
 
Hopefully examples like these will encourage you to explore Dataphor -:)

Monday, July 02, 2007

Dataphor - Data scrubbing using lists

This contrived problem illustrates the ease with which lists can be used
in D4. The example makes use of the following Dataphor constructs:

TableDee ( table { row { } } )
http://www.alphora.com/docs/D4LGColumns.html

Indexer Expression
http://www.alphora.com/docs/O-System.iIndexer.html

Using Rows
http://www.alphora.com/docs/DDGModelingProcessLogic-ValuesandVariables-UsingRows.html

List Values
http://www.alphora.com/docs/D4LGLanguageElements-Values-ListValues.html

Using Lists
http://www.alphora.com/docs/DDGModelingProcessLogic-ValuesandVariables-UsingLists.html

Split operator
http://www.alphora.com/docs/O-System.Split.html

Concat (Concatenate)
http://www.alphora.com/docs/O-System.Concat.html

Aggregate
http://www.alphora.com/docs/D4LGTableExpressions-Aggregate.html

Max
http://www.alphora.com/docs/O-System.Max.html

Problem - create a comma delimited string from a string that initially
can contain any number of potential delimiters. Each string will be delimited
by a blank (' ') and another delimiter of a single character. The 2nd delimiter
will be a character that occurs the most times in the string. A delimiter is any
character other than a number or letter. If there are ties between delimiter
characters (other than blank) return a blank string since we can't determine the
2nd delimiter. For example:

Initial String                      New String
--------------                      ----------
"AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1"  "AAA,BB,BY3J,CCCH1,PFYSD,GE1" 

which has 6 items. The initial string has 4 potential delimiters: a comma,
a colon, a slash and a semi-colon {',',':','/',';'} besides a blank which
is always a delimiter. Since comma occurs 3 times, more than any of the
other potential delimiters, it is used along with a blank to split the string.
The other potential delimiters are removed from the new string. Note that
the characters of the new string keep their relative position (from left to
right).
For:

Initial String          New String
--------------          ----------

" LSF QR,U1/T5,P13/KE3" 

the new string is blank because '/' and ',' both occur 2 times (tied).

For:

Initial String  New String
--------------  ----------
" LSF QR U1"

the new string is blank since there must be a delimiter besides a blank (' ').


Operator ANewStr returns a row with the following columns:

 NewStr : the new comma delimited string.
 Char   : the delimiter used along with blank (see LRow).
 Items  : the number of items in the string, ie. the number of items
          that would be obtained if the new string was split into a list
          using comma as the delimiter.
 Cnt    : see LRow.
 Dels   : see LRow.

create operator ANewStr(Str:String):
  row{NewStr:String,Items:Integer,Cnt:Integer,Dels:String,Char:String}
begin  
result:=row of typeof(result){ };
/* 
LRow is a row variable with columns
 Char  : the single non number and non letter character that occurs the
         most times in the string that will serve as a delimiter along
         with a blank.
 LList : the delimiters to use to split the string, stored as a list.
 Cnt   : the count of non blanks delimiters. A 1 indicates a delimiter
         was found, >1 means ties and therefore no non blank delimiter was found.
 Dels: a concatenated comma delimited string of blank (' ') and the new
         found delimiter (Char).
*/         
var LRow:=
      (table { row { } }  add{Str Str}
          times
            numbers  
             where num<Length(Str)
              add{Str[num] Char} 
               where (Char<>' ')  and (not IsLetterOrDigit(Char))    
                group by {Char} add{Count() Cnt}   
                  return 1 by {Cnt desc} with {IgnoreUnsupported = 'true'}
                    add{'' Del}
                 group add{Max(Char) Char,Count() Cnt,
                   Concat(Char,Del order by {Char}) Dels}
                      add{(' '+ Unicode({177})+Char).Split( Unicode({177})) LList}    
                           {LList,Cnt,(Unicode({39})+' '+Dels+Unicode({39})) Dels})[];
result:=
/*
  The string is split by the Split operator using 2 delimiters, blank and 
  LRow.Char which are stored as a list in LRow.LList. Table numbers is a
  table of numbers with a single column num. The num column is used to
  label the items of the list (StrPart).
*/
( 
 (
  table { row { } }  add{Str Str}
           times
             numbers with {IgnoreUnsupported = 'true'}
               where num<Str.Split(LRow.LList).Count() with {IgnoreUnsupported = 'true'}
               add{Str.Split(LRow.LList)[num] StrPart}
                  where StrPart>' '
                    {num+1 Index,StrPart}
                       where LRow.Cnt=1     
   )                         
/*
   Split the Strpart strings into individual characters to eliminate non digits/numbers.
   Then concatenate the strings to form the new string (NewStr). Note that this
   2nd splitting is a bit redundant. We could have formed the new string with a 
   single split but this illustrates the logical, simple and straightforwards ways
   that lists can be created and used.
*/ 
     times
        numbers 
             where num<Length(StrPart)
                add{StrPart[num] Char}
                   where IsLetterOrDigit(Char)
                      add{'' Del} adorn{key{Index,num}}
                     group by {Index} add{Concat(Char,Del order by {Index,num}) NewStr}
                      add{',' Del}
                      group add{Concat(NewStr,Del order by {Index}) NewStr}
                        add{LRow.LList[1] Char,NewStr.Split().Count() Items,LRow.Cnt Cnt}
                     {IfNil(NewStr,' ') NewStr,IfNil(Items,0) Items,Char,Cnt,LRow.Dels Dels}
                        redefine { Char:=case when Items>0 then Char else 'Ties' end })[];
end;  
 
Examples:

For a single string:

select ANewStr("AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1");

NewStr                      Items Cnt Dels Char 
--------------------------- ----- --- ---- ---- 
AAA,BB,BY3J,CCCH1,PFYSD,GE1 6     1   ' ,' ,     

select ANewStr(" LSF//, /AS QR,U1/ /T5,P13//KE3") ;

NewStr                Items Cnt Dels Char 
--------------------- ----- --- ---- ---- 
LSF,AS,QRU1,T5P13,KE3 5     1   ' /' /    

Return the orginal string with the new string information (row):

var AString:="R:B U:AS,1G'E:S SG:E5.16'Z9" ;
select ANewStr(AString) add{AString Str}
       {Str,NewStr,Char,Items,Cnt,Dels};        
       
Str                         NewStr                  Char Items Cnt Dels 
--------------------------- ----------------------- ---- ----- --- ---- 
R:B U:AS,1G'E:S SG:E5.16'Z9 R,B,U,AS1GE,S,SG,E516Z9 :    7     1   ' :' 

The case of equal occurrences of a delimiter:

select ANewStr("A1,FS KE*VS1 L1,FW*A") ;

NewStr Items Cnt Dels  Char 
------ ----- --- ----- ---- 
       0     2   ' *,' Ties 

Working with data in a table:

var StrTable:=
  table
   {
    row{1 ID,"AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1" Str},
    row{2," LSF//, /AS QR,U1/ /T5,P13//KE3"},
    row{3,"R:B U:AS,1G'E:S SG:E5.16'Z9"},
    row{4,"A1,FS KE*VS1 L1,FW*A"}
   };
select StrTable 
 add{
     ANewStr(Str).NewStr NewStr,
         ANewStr(Str).Char Char,
           ANewStr(Str).Cnt Cnt
    };

ID Str                              NewStr                      Char Cnt 
-- -------------------------------- --------------------------- ---- --- 
1  AAA,BB B:Y3J,CC/C:H1 PF/YSD,G;E1 AAA,BB,BY3J,CCCH1,PFYSD,GE1 ,    1   
2   LSF//, /AS QR,U1/ /T5,P13//KE3  LSF,AS,QRU1,T5P13,KE3       /    1   
3  R:B U:AS,1G'E:S SG:E5.16'Z9      R,B,U,AS1GE,S,SG,E516Z9     :    1   
4  A1,FS KE*VS1 L1,FW*A                                         Ties 2   


Interested readers may want to check out my comment on sql attempts
to simulate lists.

Sql - History repeats itself simulating lists

As explained here, before sql-99 analytic functions a rank was expressed
in the form of a subquery:

(select aggregate from <foo> where predicate) as rank

where Count was generally used as the aggregate and the predicate was an 
inequality comparison (table2.Y<=table1.Y). As the definition of the rank
was refined the predicate became progressively more complex. If duplicate
values existed and unique ranks desired then another column had to be used
to resolve ties. An expanded predicate could look like:

where (table2.Y<table1.Y) OR (table1.Y=table2.Y AND table2.RowID<=table1.RowID)

Many experts were so uncomfortable with the nature of computing ranks
that they shifted the intent from a rank to a 'line number'. With this
slight of hand they were able to argue that it was a 'format' feature 
and therefore on the conceptual level of 'reporting'. And such things are
always done on the client. 
With the introduction of sql-99 ranking functions and the benefit of clear
hindsight the whole issue of how ranks were previously computed revealed
itself. Compared to the simplicity and efficiency of ranking functions
the sql queries were seen for the odious things they were. What was thought
to be 'cleaver' sql was really a query worthy of Dr. Frankenstein. And any
calls for removal to the client disappeared. So one would think sql users
would have learned an important lesson - the simulation of a construct
by a bunch of unrelated constructs is a poor substitute for the real
thing. But it seems sqls span of attention is short and history is repeating
itself. We can add the idea of simulating a 'list' to the misbegotten ranks.

Just as an aggregate and predicate were used for ranks the 'substring' 
function and a table of numbers are being used to formulate a 'list'.
Just like the rank predicate, the substring expression becomes more 
complex as the nature of the list becomes more refined. There are many
examples of list simulation such as here. But all sql list simulations
are fundamentally flawed. The principle way for sql to represent something
is by a 'table'. So a table is used to represent a list. But a 'table'
is a different animal than a 'list'. Sql can logically represent a rank
because a rank is a number and sql has a type number. But sql has no
type for a list. The same holds for representing a list by any type of
string. A string, no matter delimited or non-delimited is not a list.
One would think that the function 'substring' would lead to questioning
the sql ways to create a list. It is, after all, substring, ie. a function
designed explicitly for a 'string'! With a list sql is doomed before it
even starts to simulate it. The logical inconsistencies of lists in sql
are unfortunately hidden much like ranking functions were hidden. What
has filled this vacuum is so much non-sense. It is truly unfortunate
that the only thing that can reveal the extent of the logical confusion,
not to mention the code mess, is to work with a list type such as in
the D4 language of Dataphor.

About Me

My photo
Phoenix, Arizona, United States