Dataphor SQL RAC (Relational Application Companion)


A site of hope for those looking for a true relational database system

Sunday, September 10, 2006

Dataphor - Are You Constrained?

No where can the essence of a business problem or model be better
shown than thru the means of constraints. Constraints encapsulate
the logic of a model and govern the ebb and flow of an application.
Constraints in Dataphor, based on the relational model, offer these
benefits:

    * Emphasize data independence
    * The ability to reference global objects
    * The use of the declarative method in application development

The relational model allows and fosters the clear separation of
how a problem is modeled versus physical implementation details.
This is the relational concept of data independence. The solution
of a business problem should not be encumbered by anything that
detracts from the ability to clearly and precisely model it. This
is the essence of the concept of the logical model in relational
theory. Yet data independence is sorely lacking in sql where the
line of distinction between the logical model and performance
considerations are too often blurred. The most blatant example
is the conceptual entanglement of keys and indexes. A key, which
is a constraint, is a logical consideration. An index is a physical
consideration. By allowing indexes based on performance considerations
to directly enter the logic of the model the clarity of expression
that is needed becomes undermined or even lost. The price for a gain
in performance is paid for by the ambiguity introduced into the
model. Constraints in dataphor can be clearly expressed independent
of performance based concepts, ie. indexes.

In order to define constraints that need to reference global objects
like tables, sql needs to rely on mechanisms like triggers. All
to often this leads to problems with declared referential integrity
constraints and adds an additional layer of complexity to the
problem. Unlike simple column constraints or indexes, sql does
not allow a table constraint to be simply and clearly stated in a
'create table' or 'alter table' statement. Dataphor allows this
simple but powerfully expressive capability.

In sql the idea of metadata is independent of the logical constructs
used to derive data. In other words, in sql you can declare an
index but you can't declare any additional information about the
index. Dataphor allows metadata, information about the object, to
be declared along with the object. This additional information is
used in a declarative way to make application development easier.
Dataphors derivation engine thus absolves the developer of an
entire level of work. For example, by declaring metadata about
a constaint the meaning of the constraint can be clarified
while at the same time saving the developer the work to do this.

An interesting example of the confluence of constraints
(the logical model) and performance can be found in the article

'Indexes on Computed Columns: Speed Up Queries, Add Business Rules'

which can be viewed here. Very cleaver solutions to constraints
are proposed in the form of indexes on computed columns. Compare
this approach to simply, clearly and explicitly defining a
constraint with metadata in dataphor leaving performance optimization
as the last step of the application.

For example, one problem the article discusses is insuring that VPN
tokens are unique in the table while allowing any number of null VPN
token numbers.

I'll be using dataphor with MS sql server 2005 sp1 as the data
repository (backend).

Here is the table definition defined in dataphor:
(All dataphor keywords are in bold)

create table TESTC
 {
 EMPLOYEEID : Integer,
 LOCATIONID : String,
 DEPTID : String { default 'ONE' },
 VPNTOKEN : Integer { nil } ,
 key {EMPLOYEEID}
  }

The primary key is EMPLOYEEID. The 'Integer { nil }' is the dataphor
way of saying that VPNTOKEN can take a null (no value) as the data
is stored in sql server.

A table constrain can now be declared that captures the logic of
uniqueness for non null VPNTOKENs and allows unlimited nulls for it.
The constraint can be declared using the dataphor alter table.

alter table TESTC
 {
  create constraint DISTINCTVPNWITHNULLS
   exists(
          TESTC
           group add {Count(VPNTOKEN) CNT,Count(distinct VPNTOKEN) DISTVPNCNT}
             where CNT=DISTVPNCNT
          )
     tags
 {
     DAE.Message =
      "'VPNTOKEN ' + ToString(VPNTOKEN) + ' is not unique'"
 }   
  }      

The 'Count(VPNTOKEN)' and 'Count(distinct VPNTOKEN)' work the same as
they do in sql. Both count constructs eliminate null values. So the
where predicate is comparing the count of non nulls to the distinct
count of non nulls. The group statement has no by statement so
the comparison is over the whole table. The logic of the constraint
says if where is true then all the non nulls must be unique. If where
is false an attempt was made to enter an existing non null VPNTOKEN
and thus violiates the uniqueness of VPTOKEN in the table and the
constraint. The attempt to enter such a VPNTOKEN will be rejected.
Since where is independent of nulls, all null values for VPNTOKEN
will satsify the constraint (it evaluates to true) and will be
accepted. The only way a null VPNTOKEN will be rejected is if the
row being inserted violates the uniqueness of the key of the table
which is EMPLOYEEDID. This is the meaning of specifying a key. Only
unique values of EMPLOYEEID are acceptable.

The tag keyword indicates metadata is supplied with the constraint
definition. Dataphor will use the message when the constraint
is violated as opposed to a cryptic message returned by sql server
when a key/index is violated.

Lets insert some data with unique VPNTOKENs

insert
     table
      {
        row { 1 EMPLOYEEID,'A' LOCATIONID,'ONE' DEPTID, 1 VPNTOKEN },
        row { 2,'A','ONE',2},
        row { 3,'B','TWO',3}
      }
        into TESTC;
      
Insert some rows with null VPNTOKEN.

insert
     table
      {
        row { 4 EMPLOYEEID,'C' LOCATIONID,'ONE' DEPTID},
        row { 5 EMPLOYEEID,'C' LOCATIONID,'THREE' DEPTID}
      }
       into TESTC          
     
This is the data in the table.

select TESTC

EMPLOYEEID LOCATIONID DEPTID VPNTOKEN 
---------- ---------- ------ ----------
1          A          ONE    1        
2          A          ONE    2        
3          B          TWO    3        
4          C          ONE    <No Value>
5          C          THREE  <No Value>

Now in the derived form that dataphor has available for every table
and view we can try to enter a row such as:

 row { 7 EMPLOYEEID,'C' LOCATIONID,'THREE' DEPTID,2 VPNTOKEN}

Dataphor uses the tag metadata expression to clearly show the user
the problem with the specific data that violated the constraint:



I submit this is a significant improvement over the type of message
returned by sql server which tells you what object (ie. key)
caused the violation but gives no information about the 'data'
which caused it. The developer is left with this task on the frontend.

Any table constraint should be a straightforward short process to
represent. As another example, this constraint allows unique VPNTOKENs
but at most 1 null value.

alter table TESTC
 {
  create constraint DISTINCTVPNWITHONENULL
   exists(
           TESTC
            group add {
                       Count() TOTALCNT,
                       Count(VPNTOKEN) CNT,
                       Count(distinct VPNTOKEN) DISTVPNCNT
                      }
              where 
                (TOTALCNT-CNT < 2) and (CNT=DISTVPNCNT)
          )
     tags
 {
             DAE.Message =
              "'VPNTOKEN ' + IfNil(ToString(VPNTOKEN), 'NULL') + ' is not unique'"
 }   
  }   

The where statement uses '(TOTALCNT-CNT < 2)' to check that there is
at most 1 null value (there can be 0 or 1) and uses '(CNT=DISTVPNCNT)'
to insure that the non null values are unique. With a violation, the
message will display a VPNTOKEN value that already exists in the table
including 'NULL'.

Dataphor by default will display an informative message for any key
that is violated. It is not necessary to explicitly state any
message. On sql server a primary key violation displays only the
object of the violation. For example:

'Cannot insert duplicate key row in object 'dbo.testc' with unique
index 'UIDX_TESTC_EMPLOYEEID'.
The statement has been terminated.'

While dataphor will display the specific data that caused the
violation:



In the future I'll talk about other types of constraints in
dataphor such as column and transition row constraints.

Bye for now.

Tuesday, September 05, 2006

RAC - Rank This!

Sql Server 2005 Ranking functions vs. Rac Ranking

Principal players:
Rac v2.33 running on  (more info on Rac here)
MS Sql Server 2005 sp1

In general any abstraction for t-sql is good abstraction. And
the Sql-99 analytic ranking functions are indeed a welcome
addition to Sql Server 2005. There has been universal approval
and all kinds of favor has be accorded them from the Sql Server
community. But the history of sql is the history of the gang
that can't shoot straight. And the gang tends to always be a
day late and a dollar short. So did sql and MS show itself to
be all it can be with the new ranking functionality? Is the
Sql Server model superior or inferior to well..the 'RAC' model
of ranking. Two logically different approaches. Think of this
as Fox Sql, fair and balanced..you decide :)

Here is the general specification of a rank construct from Bol:

RANK_TYPE ( ) OVER ( [ < partition_by_clause > ] < order_by_clause > )

Excuse me if I ask just who thinks this stuff up? Perhaps they
should get off what their on or get back on what their off. It's
painful for those not steeped in sql speak and clearly doesn't 
capture the logical/conceptual meaning of the construct. How many
of you first looked at this spec and wondered just what is the basis
for the rank in the first place? So allow me to express it based on
common sense:

RANK_TYPE (For Column(s)/[ASC | DESC]) OVER TABLE <Optional:[PARTITION BY Column(s)]>

Res ipse loquitor - Now the thing speaks for itself. 

The rank is based on a column. To be fair this is what the sql
standard meant when they expressed the idea of the 'utility'
of a construct to express the ageless sql of a rank in a
subquery:

(select count(*) 
       from t1
         where t1.column<=t2.column) as Rank
         
The meaning of the ordering of the column and whether or not the
rank is obtained within groups/partitions or over the entire table
is clear.

Now here is the concept of a rank in Rac expressed in a similar way:

RANK_TYPE (For PARTITION Columns) OVER TABLE Optional ORDER BY Columns

Res ipse loquitor - Now the thing speaks for itself. 

Here the object of a rank is a partition! The sort order of the partition
are the column(s) of the partition but an optional sort order can be
defined with columns independent of the partition column(s). This is
a logically different concept than the sql rank. That an sql rank thru
a column(s) cannot encapsulate a rank predicated on partition(s) or
a structure if you will is because there is no guarantee that a column(s)
exists to capture or enumerate such a rank. In other words a single
subquery, the sql model, may not be adaquate. The underlying sql code
necessary to capture the rank may take the form of nested subqueries,
certainly more complicated than the sql rank single subquery.

The underlying Rac model for capturing ranks over partitions can be
seem here where I first proposed creating 'virtual groups':
    microsoft.public.sqlserver.programming 
    Mon, Jul 5 1999 
    'Difficult SQL Question'
    Name: 'Trysql' (but signed by yours truely -:)  
    The thread can be seen here. 
    
Of course a partition can be conceptually decomposed to simple column(s)
so the sql rank is encapsulated within the logical model of the Rac rank.

Here is an example that elucidates the two different ranking models.
(The data is extrapolated from an example of a Rac solution which can be seen here).

create table ranktest (ID tinyint primary key,ROWID tinyint,TYPE tinyint,POS tinyint,STRING char(1))
insert ranktest values(1,1,1,1,'1')
insert ranktest values(2,1,1,2,'2')
insert ranktest values(3,1,2,3,'-')
insert ranktest values(4,1,2,4,'%')
insert ranktest values(5,1,2,5,'=')
insert ranktest values(6,1,2,6,'B ')
insert ranktest values(7,1,1,7,'3')
insert ranktest values(8,1,1,8,'1')
insert ranktest values(9,1,1,9,'2')
insert ranktest values(10,1,2,10,'A')
insert ranktest values(11,2,2,1,'D')
insert ranktest values(12,2,2,2,'C')
insert ranktest values(13,2,2,3,'M')
insert ranktest values(14,2,1,4,'6')
insert ranktest values(15,2,1,5,'5')
insert ranktest values(16,2,2,6,'T')
insert ranktest values(17,2,2,7,'Y')
insert ranktest values(18,2,2,8,'W')
insert ranktest values(19,2,2,9,'R')
insert ranktest values(20,2,1,10,'6')
insert ranktest values(21,2,1,11,'7')

select * from ranktest

ID   ROWID TYPE POS  STRING 
---- ----- ---- ---- ------ 
1    1     1    1    1
2    1     1    2    2
3    1     2    3    -
4    1     2    4    %
5    1     2    5    =
6    1     2    6    B
7    1     1    7    3
8    1     1    8    1
9    1     1    9    2
10   1     2    10   A
11   2     2    1    D
12   2     2    2    C
13   2     2    3    M
14   2     1    4    6
15   2     1    5    5
16   2     2    6    T
17   2     2    7    Y
18   2     2    8    W
19   2     2    9    R
20   2     1    10   6
21   2     1    11   7

We want a dense rank for the partition ROWID,TYPE. Although ID 
and POS are redundant they imply a meaningfulness to the order
of rows both over the table and of TYPE within ROWID.

The Rac model can be represented conceptually as:

DENSE_RANK (For PARTITION {ROWID,TYPE}) OVER TABLE (ORDER BY ID)

or

DENSE_RANK (For PARTITION {ROWID,TYPE}) OVER TABLE (ORDER BY ROWID,POS)

Rac does not offer the autonomy of ranking specifications as in
sql. Partitions are defined by the ordinal position of columns
in @rows. In the absence of optional sorting of the rows using
@rowsort, the partition is ordered by the @rows specification
from left to right. With @rowsort the partition is orderd by the
@rowsort column(s). The Rac ranking parameter @rowindicators returns
a dense rank. 

The key parameters to obtain the dense rank are:

@rows='ROWID & TYPE & POS & STRING',
@rowsort='ID' or @rowsort='ROWID & POS'
@rowindicators='TYPE{DENSE_RANK}'

The @rows specification implies the needed partition. 
The 'TYPE' in the @rowindicators argument 'TYPE{DENSE_RANK}'
is the end point of the partition to obtain the dense rank for.
In other words, all partitions as defined in @rows will be included
for this dense rank up to and including TYPE. Therefore 'TYPE'
implies the inclusion 'ROWID' and the partition of 'ROWID,TYPE'
will be used for the rank. Any number of ranks can be defined
in a Rac execute but they will all be based on the implied partition
in @rows from left to right. A rank based on row_number() is
included via @rowcounters and uses the partition 'ROWID,TYPE'.
The name of the dense rank is within the tides, ie. '{DENSE_RANK}'.
The name of the row_number rank is 'ROW NUMBER'.

Rac executes:

Exec Rac
@transform='_dummy_',
@rows='ROWID & TYPE & POS & STRING',
-- Order the partitions (@rows).
@rowsort='ID',
@pvtcol='Report Mode',@defaultexceptions='dumy',
@from='ranktest',@defaults1='y',@rowbreak='n',@racheck='y',
@counterdatatype='tinyint',
-- DENSE_RANK() using PARTITION BY ROWID,TYPE ORDERED BY ID
@rowindicators='TYPE{DENSE_RANK}',
-- ROW_NUMBER() using PARTITION BY ROWID,TYPE ORDERED BY ID
@rowcounters='TYPE{ROW_NUMBER}'

OR

Exec Rac
@transform='_dummy_',
@rows='ROWID & TYPE & POS & STRING',
-- Order the partitions (@rows).
@rowsort='ROWID & POS'
@pvtcol='Report Mode',@defaultexceptions='dumy',
@from='ranktest',@defaults1='y',@rowbreak='n',@racheck='y',
@counterdatatype='tinyint',
@rowindicators='TYPE{DENSE_RANK}',
@rowcounters='TYPE{ROW_NUMBER}'

Result

ROWID  TYPE POS  STRING DENSE_RANK ROW_NUMBER 
------ ---- ---- ------ ---------- ---------- 
1      1    1    1      1          1
1      1    2    2      1          2
1      2    3    -      2          1
1      2    4    %      2          2
1      2    5    =      2          3
1      2    6    B      2          4
1      1    7    3      3          1
1      1    8    1      3          2
1      1    9    2      3          3
1      2    10   A      4          1
2      2    1    D      1          1
2      2    2    C      1          2
2      2    3    M      1          3
2      1    4    6      2          1
2      1    5    5      2          2
2      2    6    T      3          1
2      2    7    Y      3          2
2      2    8    W      3          3
2      2    9    R      3          4
2      1    10   6      4          1
2      1    11   7      4          2

With the sql rank the question becomes is there a column(s)
that encapsulates the bi-directional nature of TYPE within ROWID.
Sadly there is not. They only column appropriate for the sql rank
is the DENSE_RANK itself! In other words, the sql rank cannot
be stated:

DENSE_RANK() OVER (PARTITION BY ROWID,TYPE ORDER BY ?) as DENSE_RANK 

or is redundant and a non-solution:

DENSE_RANK() OVER (PARTITION BY ROWID ORDER BY TYPE) as DENSE_RANK 

Any of the sql ranking functions, not matter how they are forumulated,
will not reflect the meaning of the data. This is especially seen
by comparing the sql ROW_NUMBER() with the one from Rac :

select ID,ROWID,TYPE,POS,STRING,
ROW_NUMBER() OVER (PARTITION BY ROWID order by TYPE) as [TYPE ROW_NUMBER],
ROW_NUMBER() OVER (PARTITION BY ROWID,TYPE order by POS) as [POS ROW_NUMBER] 
from ranktest
order by ID

ID   ROWID TYPE POS  STRING TYPE ROW_NUMBER POS ROW_NUMBER RAC ROW_NUMBER 
---- ----- ---- ---- ------ --------------- -------------- -------------- 
1    1     1    1    1      1               1              1
2    1     1    2    2      2               2              2
3    1     2    3    -      7               1              1
4    1     2    4    %      8               2              2
5    1     2    5    =      9               3              3
6    1     2    6    B      10              4              4
7    1     1    7    3      3               3              1
8    1     1    8    1      4               4              2
9    1     1    9    2      5               5              3
10   1     2    10   A      6               5              1
11   2     2    1    D      9               1              1
12   2     2    2    C      10              2              2
13   2     2    3    M      11              3              3
14   2     1    4    6      1               1              1
15   2     1    5    5      2               2              2
16   2     2    6    T      5               4              1
17   2     2    7    Y      6               5              2
18   2     2    8    W      7               6              3
19   2     2    9    R      8               7              4
20   2     1    10   6      3               3              1
21   2     1    11   7      4               4              2

Again the ranks do not reflect the meaning of the data when the
rank is dependent on a column instead of a logical partition.

While the sql rank is indeed a welcome abstraction they could
have done a much better job. By using such a low level of
abstraction many clear and easy solutions to non-trivial problems
remain lost to most of the user community. This is unfortunate. 
While there is a certain quality in quantity sql never seems to
truely show that the sum is greater than the parts. And for those
who build database systems as well as those that use them and
who frown on abstraction, the burden is on them to tell the rest
of us why. Less can be very much more.



Obtaining sql ranks with Rac.

This is intended to show how Rac obtains equivalent sql ranks.

This example use the adventureworks query in Bol shown under
'Ranking Functions (Transact-SQL)' 
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm

An additional rank for ROW_NUMBER within PostalCode is added.

SELECT c.FirstName, c.LastName
    ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number'
--Rank for LastName,FirstName Partitioned by Postcode    
    ,ROW_NUMBER() OVER (PARTITION BY a.PostalCode ORDER BY c.LastName,c.FirstName) AS 'PART. Row Number'
    ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank'
    ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank'
    ,NTILE(4) OVER (ORDER BY a.PostalCode) AS 'Quartile'
    ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s 
    INNER JOIN Person.Contact c 
        ON s.SalesPersonID = c.ContactID
    INNER JOIN Person.Address a 
        ON a.AddressID = c.ContactID
WHERE TerritoryID IS NOT NULL 
    AND SalesYTD <> 0;
    
FirstName  LastName   Row Number PART. Row Number Rank Dense Rank SalesYTD    PostalCode      
---------- ---------- ---------- ---------------- ---- ---------- ----------- --------------- 
Maciej     Dusza      1          1                1    1          4557045.05  98027
Shelley    Dyck       2          2                1    1          5200475.23  98027
Linda      Ecoffey    3          3                1    1          3857163.63  98027
Carla      Eldridge   4          4                1    1          1764938.99  98027
Carol      Elliott    5          5                1    1          2811012.72  98027
Jauna      Elson      6          1                6    2          3018725.49  98055
Michael    Emanuel    7          2                6    2          3189356.25  98055
Terry      Eminhizer  8          3                6    2          3587378.43  98055
Gail       Erickson   9          4                6    2          5015682.38  98055
Mark       Erickson   10         5                6    2          3827950.24  98055
Martha     Espinoza   11         6                6    2          1931620.18  98055
Janeth     Esteves    12         7                6    2          2241204.04  98055
Twanna     Evans      13         8                6    2          1758385.93  98055    


Sql Rank                                   Rac Rank
ROW_NUMBER() OVER (ORDER BY a.PostalCode)  Rac by default computes a ROW_NUMBER over the
                                           table in the sort order implied by @rows or
                                           explicitly specified in @rowsort. It is 
                                           referenced by @tablecounter if there is no
                                           @select or by rd in a select statement.

RANK() OVER (ORDER BY a.PostalCode)        @rowruns='dumy^rank() within tablecounter^(dumy){PostalCode}'
                                           Rac does not have a default for RANK() as accounting
                                           for ties is seldomed used or needed. But it can be obtained 
                                           easily using a form of the Rac running sums parameter
                                           @rowruns. 
DENSE_RANK() OVER (ORDER BY a.PostalCode)  PostalCode{[Dense Rank]}
                                           A trivial case of a column (sql) and a partition (Rac)
                                           having the same semantics.
NTILE(4) OVER (ORDER BY a.PostalCode)      Rac has no direct equivalent expression. It can be computed
                                           using a combination of Rac options but it seems seldom
                                           called for.

The sql rank:

ROW_NUMBER() OVER (PARTITION BY a.PostalCode ORDER BY c.LastName,c.FirstName)


is equivalant to the Rac @rowcounters='PostalCode{[PART. Row Number]}'. The implied
sort order in @rows, PostalCode,LastName,FirstName, gives the Rac @rowcounters
specification the same semantics as this ROW_NUMBER.

Regardless of the order columns are specified in @rows, the data can
be presented/manipulated at will via the @select parameter.

Here is the Rac version which returns the same result as the adventureworks query.

Exec Rac
@transform='_dummy_',
@rows='a.PostalCode & c.LastName & c.FirstName & s.SalesYTD',
@pvtcol='Report Mode',
@from='Sales.SalesPerson s 
    INNER JOIN Person.Contact c 
        ON s.SalesPersonID = c.ContactID
    INNER JOIN Person.Address a 
        ON a.AddressID = c.ContactID',
@where='TerritoryID IS NOT NULL AND SalesYTD <> 0',
@rowbreak='n',@defaults1='y',@racheck='y',
@tablecounter='y',
@rowindicators='PostalCode{[Dense Rank]}',
@rowcounters='PostalCode{[PART. Row Number]}',
@rowruns='dumy^rank() within tablecounter^(dumy){PostalCode}',
@select='select FirstName,LastName,rd as [Row Number],[PART. Row Number],
            1*runs as RANK,[Dense Rank],SalesYTD,PostalCode 
               from rac
                 order by rd'

Saturday, September 02, 2006

RAC - Are you Coordinated?

Is not the concept of a 'coordinate' a clear and powerful
idea? The entire world is mapped by the coordinates of
latitude and longitude. You can pinpoint any place in the
world given its coordinates. A primary key on a table is
a coordinate. The key identifies preciously the row in
the table. If you can create the coordinates of any 
information you have created the ability to preciously
identify anything about it. Especially when the information
is data.

A frequent problem is finding a particular occurrence of a
'some thing' in a string. The 'some thing' is frequently
a number. The problem becomes finding the Nth occurrence
of a number, any number. For example, given the string:

'ASF--124 RW!==RXT34146$==PFS773044Y -JPFK*5435DAD '

extract the 3rd occurrance of a number, ie. '773044'.

Often the solution to this is to build a forrest of string
functions. But it is too easy to get lost in such a
forrest:) But, if we can map the string with coordinates
the problem and solution become clear, precise and
simple. So how do we create coordinates and what
form do they take? Well we can transpose the string to
rows and create a set of coordinates for each row, 
ie. each character. The rows/characters correspond to
the string as we read it from left to right. Once the
string is transposed we'll create 2 coordinates. The
first one is simply an indicator for whether or not
the character is a digit. This makes sense since
conceptually we only care if a character is a digit
(part of a number) or is a non-digit, not part of
a number. We're not concerned about individual
characteristics of any character, right?:) Now the
2nd coordinate, well that's going to be a bit of magic.
And this bit of magic is going to take the form of
a 'rank'. I'm now going to put RAC to use to develop
a very particular but easy rank.

I'll be using:
MS Sql Server 2005 sp1
RAC v2.34 beta
For details on RAC go here.

First some test data:

create table RT (rowid int primary key,messystr varchar(30))
insert RT values(1,'12-%= 312 ')
insert RT values(2,'YTW$152&OP8393&K734680UF104')
insert RT values(3,'624/!/53RF274UT56')
insert RT values(4,'342')
insert RT values(5,'ORDER')
insert RT values(6,'\\8AApEFS30902$1')
insert RT values(7,'4#7/&91A56G7VW37@&&&&')

select * from RT

rowid       messystr                       
----------- ------------------------------ 
1           12-%= 312 
2           YTW$152&OP8393&K734680UF104
3           624/!/53RF274UT56
4           342
5           ORDER
6           \\8AApEFS30902$1
7           4#7/&91A56G7VW37@&&&&

We're going to start by splitting each string character
by character per rowid. We're also going to create the
first coordinate, a new column, that's 1 if the character
is a digit and 2 for everything else. This is column 'type'.
Column pos is the position (from left to right) of the
character ('str1') in 'messystr'. Column 'rd' is generated
by Rac and is an ascending integer over the whole table
that follows the sort order by 'rowid' and 'pos'. The
result is saved in table ##t1.

Exec Rac
@split='[position]',
@rows='rowid & [position]', 
@pvtcol='messystr',@rank='str',
@splitby='1', -- Splits a string by N consecutive positions.
@from='RT',@defaults1='y',@rowbreak='n',@racheck='y',
@select='select rd,1*rowid as rowid,1*position as pos,str1,
         case when ascii(str1) between 48 and 57 then 1 else 2 end as type
         into ##t1
           from rac
             order by rd'

Here is the result for 'rowids' 1 and 6.

select * from ##t1 where rowid in (1,6)

rd          rowid       pos         str1 type        
----------- ----------- ----------- ---- ----------- 
1           1           1           1    1
2           1           2           2    1
3           1           3           -    2
4           1           4           %    2
5           1           5           =    2
6           1           6                2
7           1           7           3    1
8           1           8           1    1
9           1           9           2    1
10          1           10               2
63          6           1           \    2
64          6           2           \    2
65          6           3           8    1
66          6           4           A    2
67          6           5           A    2
68          6           6           p    2
69          6           7           E    2
70          6           8           F    2
71          6           9           S    2
72          6           10          3    1
73          6           11          0    1
74          6           12          9    1
75          6           13          0    1
76          6           14          2    1
77          6           15          $    2
78          6           16          1    1

Now for our 2nd coordinate, a rank. The 'type' varies
within 'rowid' ordered by 'rd'. We want a rank that
increments only when 'type' changes within 'rowid'
and consequently stays the same within 'type'. Within
a 'rowid' the rank can range from 1 to N. When the
'rowid' changes the rank resets to 1. This kind of
rank is obtained using the Rac @rowindicators parameter.

Exec Rac
@transform='_dummy_',
@rows='rowid & type & pos & str1',
@rowsort='rd',
@pvtcol='Sql*Plus',@defaultexceptions='dumy',
@from='##t1',@defaults1='y',@rowbreak='n',@racheck='y',
@rowindicators='type{typerank}',@counterdatatype='int'

Here is the result for rowid's 1 and 6.

rowid  type pos  str1 typerank    
------ ---- ---- ---- ----------- 
1      1    1    1    1   1st occurrence @1
1      1    2    2    1   1st occurrence @1
1      2    3    -    2
1      2    4    %    2
1      2    5    =    2
1      2    6         2
1      1    7    3    3
1      1    8    1    3
1      1    9    2    3
1      2    10        4
6      2    1    \    1
6      2    2    \    1
6      1    3    8    2    1st occurrence @2
6      2    4    A    3
6      2    5    A    3
6      2    6    p    3
6      2    7    E    3
6      2    8    F    3
6      2    9    S    3
6      1    10   3    4
6      1    11   0    4
6      1    12   9    4
6      1    13   0    4
6      1    14   2    4
6      2    15   $    5
6      1    16   1    6

With 'type' and 'typerank' we now have the coordinates
of any occurrence of a number. If you examine the
table closely you should be able to come up with the
formulation of it:) Ok here it is:

The Nth occurrence of a number has coordinates:

'type'=1 and

'typerank' in ((2*N)-1 , 2*N)
 
1st occurrence
(2*1)-1 , 2*1 : in (1,2)
 
2nd occurrence
(2*2)-1 , 2*2 : in (3,4)
 
3rd occurrence
(2*3)-1 , 2*3 : in (5,6)
 
4th occurrrence
(2*4)-1 , 2*4 : in (7,8)

Note that 'typerank' is an 'or' not an 'and'. You are
guaranteed to have a number in one but not both:)
For example, for the 1st occurrence the number will
have a 'typerank' of 1 if it is not proceeded by 
non-digits (see 'rowid' 1 above). If non-digits come
first the 1st occurrence is at 'typerank' 2 (see 'rowid'
6 above). The same logic applies to any Nth occurrence.

So now we can put a variable filter in RAC that accepts
the computed coordinates for 'typerank' for any particular
occurrence we want/pass to it (a form of macro substitution
for stored procedure parameter arguments). The filter
is contained in the @wherecounters parameter.

For example, we'll get the rows for the 2nd occurrence
of a number.

declare @N int,@N1 int,@N2 int
-- Only the value of @N is needed. 
set @N=2
set @N1=(@N*2)-1
set @N2=@N*2
--
Exec Rac
@transform='_dummy_',@user1=@N1,@user2=@N2, 
@rows='rowid & type & pos & str1',
@rowsort='rd',
@pvtcol='Sql*Plus',@defaultexceptions='dumy',
@from='##t1',@defaults1='y',@rowbreak='n',@racheck='y',
@rowindicators='type{typerank}',@counterdatatype='int',
--
-- Use coordinates to filter rows. @N1 and @N2
-- will be substituted for @user1 and @user2.
--
@wherecounters='type=1 and typerank in(@user1,@user2)',
@select='select rd,1*rowid as rowid,1*pos as pos,str1
         into ##t2
         from rac
          order by rd'

select * from ##t2

rd          rowid       pos         str1 
----------- ----------- ----------- ---- 
7           1           7           3
8           1           8           1
9           1           9           2
21          2           11          8
22          2           12          3
23          2           13          9
24          2           14          3
44          3           7           5
45          3           8           3
72          6           10          3
73          6           11          0
74          6           12          9
75          6           13          0
76          6           14          2
81          7           3           7

Now we only have to concatenate the rows within 'rowid'
to return the number. This is very simple with Rac.

Exec Rac
@transform='Max(str1) as str',
@rows='rowid',
@pvtcol='pos',
@from='##t2',
@defaults1='y',@racheck='y',
@concatenate='str',@stringname='number',
@separator='',@cutpvt='y'

rowid  number 
------ ------ 
1      312
2      8393
3      53
6      30902
7      7

And there you have it. Of course the whole process can 
be streamlined.

So are you going to get coordinated or are you going to
continue to go around in circles? :)


To paraphrase Gordon Geiko from 'Wall Street':
'Abstraction is good, it clarifies and purifies...'
It allows you to solve a problem, an end instead of
solving a means and then an end. There is something
to be said about qualitatively different solutions.
Precisely what I am unsure, but I think it something
very good.

We need more abstraction in data management. That is
why I am an advocate of Dataphor and the D4 language.
That Rac doesn't have a lot of company is not a good
thing, it's a bad thing. That a company like MS has
a vested interest in not fostering abstraction is a
bad thing. The industry continues to flounder in its
own sea of code and contradictions. That is not a good thing.

Find your way to show another way.

(Added 12/20/2008
 Note to see a much easier solution to this problem using Dataphor see:
'Extracting numbers from a string with Dataphor')

Friday, September 01, 2006

Dataphor - Get LiT!

LiT: = lists into tables:)

I'm going to talk about a bunch of kewl stuff as it
relates to LiT. So get relaxed and come take a ride on
the D(4) train:)

As I showed here the D4 language makes it very easy to split
strings. Now I want to show you some more kewl stuff about
working with strings, lists and tables using the Split operator.

Lets start with this string delimited by the pipe ('') character:

'FFBB  HH HH  D AA'

Notice that blanks can appear at the beginning, end and within a
string part.

The Spit operator takes a string and splits it into a 'list' of
strings based on one or more delimiters. Conceptually this means
that the above string is transformed into:

'FF',' BB',' HH HH ',' D ',' AA'

in other words a comma delimited list of strings. And with a list
you can access and operate on the individual strings parts. The
concept of a list is conceptually close to what concept in a database?
Yep, you guessed it - a table! Each string in the list fits nicely
with the idea of a row in a table. Make sense? And once the list is
in a table, well we can do just about anything with it as you'll see:)

So lets create a table from our example string using Split.

The basic construction of Split using the method style is:

'somestring'.Split({delimiters})

With our example string, which uses the pipe as a delimiter, we have:

'FFBB  HH HH  D AA'.Split({''})

The Split operator uses a thingie called an 'indexer' to access the
individual parts of the list, like this:

'some string'.Split({delimiters})[index]

where index is a digit for the desired element of the list. So if we
want the 1st string/element in the list we use:

'FFBB  HH HH  D AA'.Split({''})[0]

We're using 0 because the indexer is 0 based, ie. 1st element 0, 2nd
element 1 etc. Real easy. Ok we've got enough info now to select
the 1st string:

select 'FFBB  HH HH  D AA'.Split({''})[0]

FF

Ok for the moment lets leave the idea of getting individual parts
of the list with the indexer and think of getting 'the whole list'
with one swipe of the magical D4 wand. And that wand is the D4
ToTable. Since Split has done the heavy lifting of creating the
list, all we have to do to get LiT  is this: 

select ToTable( 'FFBB  HH HH  D AA'.Split({''}))

Value  
-------
 D     
 HH HH 
AA     
BB     
FF  

and we are Lit (list in table) :)

When you transform a list to a table the column by default is 'Value'.
You can name the column anything you want by including a column
name in Split. I'll call the column 'STRING':

select ToTable( 'FF GG HH HH  D  AA'.Split({''}),'STRING')

STRING 
-------
 D     
 HH HH 
AA     
BB     
FF     

Ok we got the list in a table but you see that the strings have the
same blanks as they had originally. Since we are LiT we can trim the
leading and trailing blanks of each row (string) using :

{Trim(STRING) STRING}

This means trim each row and retain the same column name.

select ToTable ( 'FFBB  HH HH  D AA'.Split({''}),'STRING' )
                                         {Trim(STRING) STRING}

STRING
------
AA    
BB    
D     
FF    
HH HH 

Ok at this point your head is probably ready to explode since
you can see that while I've made all this fuss about LiT,
the rows of the table does not match what the order of strings
should be! In other words, if Split creates a list shouldn't
the table reflect the strings like:

STRING
------
FF
BB
HH HH
D
AA

If this has occurred you I'm doing my job:) As I said above
we can use what's called an indexer to address the elements
of the list in order. And we'll do that soon. But, it's
meaningful to get LiT even though the elements are not in
the proper sequence.

Just understand that without supplying specific information
to Split about the individual elements (the indexer thingie [])
we can only get a table whose rows reflect the ordering of the
string elements themselves.

Here we add a system generated auto-incrementing column INDEX
using the GetNextGenerator operator that reflects the ascending
order of STRING. (The INDEX column is similar to the identity
function used in MS Sql Server). You can see that INDEX reflects
the original (untrimmed) strings.

select (ToTable ( 'FFBB  HH HH  D AA'.Split({''}),'STRING' ))
        add {GetNextGenerator('INDEX') INDEX,Trim(STRING) STRING1}
                                                   
STRING  INDEX STRING1
------- ----- -------
 D      1     D      
 HH HH  2     HH HH  
AA      3     AA     
BB      4     BB     
FF      5     FF     


Here we get the trimmed strings and then add the auto-incrementing column.

select (
        (ToTable ( 'FFBB  HH HH  D AA'.Split({''}),'STRING' ))
                         {Trim(STRING) STRING}
       )
         add { GetNextGenerator('INDEX') INDEX }
 
STRING INDEX
------ -----
AA     1    
BB     2    
D      3    
FF     4    
HH HH  5    

The INDEX now reflects the ascending order of the trimmed
strings. No matter how you cut it, splitting a string
in toto into a table is not going to give you the order
as they appear in the original string. As I said even though
this is true it still has value and significance. Now I'm
going to show you why.

Here are 2 tables created on MS Sql Server 2005. We're
going to access them with Dataphor using D4.

create table str1 (ID1 int primary key,STRING1 varchar(50))
insert str1 values(1,'AABB CC')
insert str1 values(2,'FFGGHH MM J KK')
insert str1 values(3,'SSTTUUVV')
insert str1 values(4,'AAYYYYPQQRDD')
insert str1 values(5,'D10B20A30')

create table str2 (ID2 int primary key,STRING2 varchar(50))
insert str2 values(1,'AA  BB CC')
insert str2 values(2,'FF GG HH MM  J  KK')
insert str2 values(3,'SSTTUUXX')
insert str2 values(4,'AAPPPPWRQRDD')
insert str2 values(5,'D10B20')

Using D4:

select str1

ID1 STRING1           
--- ------------------
1   AABB CC         
2   FFGGHH MM J KK
3   SSTTUUVV       
4   AAYYYYPQQRDD  
5   D10B20A30       

select str2

ID2 STRING2               
--- ----------------------
1   AA  BB CC           
2   FF GG HH MM  J  KK
3   SSTTUUXX           
4   AAPPPPWRQRDD      
5   D10B20             

Lets suppose we want a query that compares by ID STRING1
and STRING2 as lists. We want the ID's where the lists
are the same. In other words, we want the ID's where all
elements in STRING1 and STRING2 are the same (equal). So
we're looking for ID's 1 and 2 where the trimmed elements
are the same.

You'll recall that we LiT a string using:

select ToTable ( 'FFBB  HH HH  D AA'.Split({''}),'STRING' )
                                         {Trim(STRING) STRING}
                                           
All we have to do to LiT our strings is use STRING1 and
STRING2 in Split. And once we have LiT both strings we
can wave a magic wand. We can directly test to see if the
2 tables are equal! We simply join by ID's and then test
the tables for equality in where.

select 
 str1 join str2 by ID1=ID2
  where 
   ((ToTable(STRING1.Split({''}),'STRING')) {Trim(STRING) STRING})
   =
   ((ToTable(STRING2.Split({''}),'STRING')) {Trim(STRING) STRING})
       
ID1 STRING1            ID2 STRING2               
--- ------------------ --- ----------------------
1   AABB CC          1   AA  BB CC           
2   FFGGHH MM J KK 2   FF GG HH MM  J  KK

Comparing tables is just a logical extension of comparing
integers or strings or just about anything else. Tables
are equal if they have the same column(s) (STRING) and the
elements (rows) are equal. Is that a magic wand or what:)
Now it's fundamentally important to understand how this
works. The key of each table is of course STRING. Each
key also implies an order.  So each row, ordered by
the key (STRING), in each table is compared to the
corresponding row in the other table just like you'd
compare integers. If every row is equal in the 2 tables
the comparison is true, otherwise it's false. D4 is
thus able to logically address the rows in a table via
the key. This is fundamentally different than sql which
can't do this. The ability to use keys for logical
addressing has big, big implications. It's one of the
fundamental things that differentiates a relational
language from sql! I'll get into it in future posts.
Now back to our example.

Note that for ID's 5 that have 3 and 2 rows respectively,
the table comparison was no problem.
I figure that if your still with me your interested and
pretty sharp. So I'll give you the benefit of the doubt
and assume a pretty important question has occured to
you. What if the original string has duplicate elements?
For example:

'AAATTCCAAA'

Lets see what happens when we try to LiT this:

select ToTable ( 'AAATTCCAAA'.Split({''}),'STRING' )
                        
WoW! We get this from the Dataphor compiler:
'Internal Index Error: "Duplicate key violation." '

Yeah we're SOL but does it makes sense? Well all tables
in D4 have to have a key. No if, ands or buts. The key
is of course STRING and when the string is split into
a table there are duplicates of 'AAA' and there
can't be any duplicate rows. So there goes our table:(
I'm sure you can guess the remedy:) But remember that
D4 is a strongly typed language. A list is not the
same as a table. We've been trying to convert a list to
a table. Too different types, that is the point of
having to convert. And sometimes it just won't work
the way we want it to. Before we remedy this, things
could get more dicey if we have this:

select ToTable ( 'AAA TTCC AAA '.Split({''}),'STRING' )

STRING
------
 AAA  
AAA   
CC    
TT    

And then trimmed the strings:

select ToTable ( 'AAA TTCC AAA '.Split({''}),'STRING')
                         {Trim(STRING) STRING}
                        
STRING
------
AAA   
CC    
TT    

An element 'AAA' was eliminated (no duplicates allowed) so
we end up with 3 elements instead of 4. And if we compared
this with

'AAATTCC'

select TableDee add {'False Positive' Compare}
   where
     (ToTable ( 'AAA TTCC AAA '.Split({''}),'STRING')
                         {Trim(STRING) STRING} )
     =
     (ToTable ( 'AAATTCC'.Split({''}),'STRING')
                         {Trim(STRING) STRING} )

Compare       
--------------
False Positive

we'd get a false positive. The tables are equal but
the lists are not! Phew:) (Don't worry about the
'TableDee' thingie, I'll get to that in due time:)

But please come to the idea that table comparisons
are a magic wand and once you start using them you'll
wonder how you ever got along without them. But alas,
you can loose some of that magic when working with
lists so be careful:) (More on table comparisons here).

So finally lets get to splitting the list by the
individual elements, ie. using an indexer ([]).

On the MS Server newsgroups it's common to see the
phrase 'use a set oriented approach'. This means
don't use procedural/imperative code. So no
cursors, or loops:)
Now I've already gone over splitting strings here.
But I want to show you in a little more depth what
is going on.

Given a string:

'AAA TTCC AAA '

A basic query to get the elements of the list is:

select ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX')
  where 'AAA TTCC AAA '.Split({''}).Count()>INDEX
   {INDEX+1 INDEX,'AAA TTCC AAA '.Split({''})[INDEX].Trim() STRING}

INDEX STRING
----- ------
1     AAA   
2     TT    
3     CC    
4     AAA   

Note that with the inclusion of INDEX we are guaranteed
not to eliminate duplicate STRINGs. The INDEX column
is always unique and takes part in the key of the
table.

Now the statement:

ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX')

creates a table with column INDEX for the list of integers.

'AAA TTCC AAA '.Split({''}).Count()>INDEX

filters on INDEX where the count of the elements in the list
is greater than the INDEX value. There are 4 elements in
the list so INDEXs 0, 1, 2, 3 satisfy this criteria. Remember
lists are 0 based. So we end up with a table with 4 rows, ie:

INDEX
-----
0    
1    
2    
3    

Then the statement:

{INDEX+1 INDEX,'AAA TTCC AAA '.Split({''})[INDEX].Trim() STRING}

is applied. This statement modifies the above table by redefing
INDEX and adding the column STRING. The STRING column is each
list element, in order, identified by the indexer ([INDEX]).
Each element of the string is also trimmed. We added 1 to INDEX
because.. well I wanted too because I like to start from 1 not 0:)
So there you have it. Well...not quite. There's stuff going on
here under the covers that you should know about and understand.

How is it that Dataphor interprets:

{0, 1, 2, 3, 4, 5, 6, 7, 8 }

as a list of integers and not something else?

To answer this let me show you formally how this table of
integers is created in a block of D4 code.

begin
   var Indexes : list(Integer);
   Indexes := { 0, 1, 2, 3, 4 , 5, 6, 7, 8 };
   create session table Indexes
   from
   ToTable(Indexes,'INDEX');
end;

A variable, Indexes, is assigned the type list(Integer), ie.
an infinite list where the elements are of type integer.
Once variable Indexes is defined as a type of list, the actual
elements of the list (the integers) are assigned to it.
Now the list can be converted into a table. The

create session table Indexes
from

creates what is called a session table. It's like a permanent
table but is only available to the Dataphor session that
creates it. It's like an MS Sql Server ## temp table.

select Indexes

INDEX
-----
0    
1    
2    
.
8

The answer to the question of how Dataphor knows that:

ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX')

is a list of integers is because the system has inferred
it! That's why it isn't necessary to formally declare
any variable with a type of list(Integer). In other
words, a comma separated bunch of integers is logically
a list of integers. The D4 compiler as an inference
engine is a super big deal!:) The concept of logically
inferring metadata (ie.keys, constraints) does not
exist in an sql database. I'll get into this big
deal in future posts. Ok, back to our example:)

So now you know some more about the integer list in
our query:

select ToTable( {0, 1, 2, 3, 4 , 5, 6, 7, 8 },'INDEX')
  where 'AAA TTCC AAA '.Split({''}).Count()>INDEX
   {INDEX+1 INDEX,'AAA TTCC AAA '.Split({''})[INDEX].Trim() STRING}

And what about the argument to Split? Well here is a
formal code block for splitting a string (using the
above Indexes table).

begin
  var LList : list(String);
  var ExString : String;
  ExString := 'AAA TTCC AAA ';
  LList := ExString.Split( {''} );
create session table ExList
from
 Indexes
   where LList.Count()>INDEX
    {INDEX+1 INDEX, LList[INDEX].Trim() STRING} ;
end;  

select ExList

INDEX STRING
----- ------
1     AAA   
2     TT    
3     CC    
4     AAA   

We don't have to explicitly declare that Split should
result in a list of String elements. All this information
is inferred by the compiler (thank you very much D4:)

I'm rambled enough:) Now we're going to finally,
finally split strings and generate a little report.

For good measure we're going to insert another row
in our string tables, str1 and str2, via D4:

insert
 table
 {
     row { 6 ID1, 'AAA TTCC AAA ' STRING1 }
 }
 into str1

insert
 table
 {
     row { 6 ID2, 'TTCCAAA' STRING2 }
 }
 into str2

So now have:

select str1
ID1 STRING1           
--- ------------------
1   AABB CC         
2   FFGGHH MM J KK
3   SSTTUUVV       
4   AAYYYYPQQRDD  
5   D10B20A30       
6   AAA TTCC AAA    

select str2
ID2 STRING2               
--- ----------------------
1   AA  BB CC           
2   FF GG HH MM  J  KK
3   SSTTUUXX           
4   AAPPPPWRQRDD      
5   D10B20               
6   TTCCAAA       

We want to compare strings by ID and only where
the lists have the same number/count of elements. And
we only want the strings where there are differences
for the same element (INDEX). So each row will include
the ID, the corresponding different STRINGs, the
INDEX for each different element and the individual
different string parts.

select
 (
  (
   (  
    (str1 join str2 by ID1=ID2)
      where STRING1.Split({''}).Count()=STRING2.Split({''}).Count()
   )
    times // times is like the sql cross join.
     Indexes
      where STRING1.Split({''}).Count()>INDEX
  )  
      where // <> is the notation for not equal.
       STRING1.Split({''})[INDEX].Trim()<>STRING2.Split({''})[INDEX].Trim()
 )    
          {
            ID1 ID,STRING1,STRING2,INDEX+1 INDEX,
            STRING1.Split({''})[INDEX].Trim() DIF1,
            STRING2.Split({''})[INDEX].Trim() DIF2
          }

ID STRING1          STRING2          INDEX DIF1 DIF2
-- ---------------- ---------------- ----- ---- ----
3  SSTTUUVV      SSTTUUXX      4     VV   XX  
4  AAYYYYPQQRDD AAPPPPWRQRDD 2     YYYY PPPP
4  AAYYYYPQQRDD AAPPPPWRQRDD 3     PQ   WR  

Coming from sql you should think of each set of matching
parenthesis (indented) as a derived table. Unlike sql,
a derived table in D4 does not start with a 'select'
statement. In D4 it's just the opposite of sql. A table
is derived and then you specify the columns you want.
Which way makes more sense? In sql you pay for the
meal before you even order it:) Also note in D4 you
don't have to give the derived able a name (alias). In
D4 things are much cleaner and more logical.

I hope you found this interesting. And there's plenty
more to come:)

Friday, August 18, 2006

Dataphor - Sql: Table This!

Recently a learned user of MS Sql Server pointed out to me
that MS supports something they call a 'table variable' which
is of type table.

It is defined here.

Now MS does indeed say:
'variables can be declared to be of type table'
'To declare variables of type table, use DECLARE @local_variable.'

Then they go on to add a real zinger:
'Assignment operation between table variables is not supported.'

Talk about having your cake and wanting to eat it!
This is nothing but bs:) A variable can only be of type table
IF it supports assignment operations. If you can't put two
tables around an equality (=) and have the compiler understand
that is a comparison between two tables, you don't have a
table type. D4 (as I will be referring to the dataphor relational
language from now on) makes explicit comparisons between tables
as shown here.

Also note that there is no such thing in Sql Server as a table 
type. Permanent tables, temporary tables and table valued functions
are subject to the same argument as table variables. In fact
Sql Server would have to require every thing they call a table
to at least have a key, let alone support assignment operations,
before they could even think of saying they support a table type.

This is just another example of D4 vs. sql or TITS vs. FITS :)

Thursday, August 17, 2006

Dataphor - a Formal Defintion

A formal defintion of Dataphor can be found here.

Simply put Dataphor represents a new declarative solution
to application development. At the heart of dataphor is
data management based on a true implementation of the
Relational Model. Dataphor builds on this relational foundation
to provide various application development services such as
the presentation layer. Dataphor therefore represents a new
paradigm for development called automated application
development. And is the next step in the evolution of
application development technology beyond current rapid
application development or RAD. 

The relational model in dataphor is based on the work of
E. F. Codd and continued by C. J. Date.

The data management language used in dataphor is formally
called D4. It is a relational language embedded in an 
imperative language. The relational component is based
on the relational language put forth by C. J. Date and
Hugh Darwen called Tutorial D in:

'Databases, Types and the Relational Model (3rd Edition)'
which can be found here.

The imperative part of D4 is based on the Pascal language.

In other words, in D4 the statement:

if x=y then
<do something>
else
<do something else>

is as appropriate for integers, as it is for strings
as it is for tables!

Unlike sql systems, the dataphor compiler is a relational
inference engine. In faithfully following the relational
model, keys are inferred for newly derived tables. Whereas
sql systems emphasis keys to physically access tables,
dataphor uses keys to logically access tables. Such
emphasis on the relational logical model allows operations
on tables that are not supported in sql. Updatability of
views of almost any complexity and the ability to declare
constraints of any complexity are but two examples.

Dataphor is also a derivation engine. At the presentation
layer forms are derived from the logical definition of tables
and from user declared metadata. It is in this sense that
dataphor uses a declarative method for application development.

Dataphor uses a device to store and access data. In other
words data resides in a repository accessed by dataphor.
A device can be any sql system such as MS Sql Server,
Oracle, DB2 etc. If desired the device can be accessed directly
thru D4 by the dialect of sql it supports. Data created
directly on an sql system can be easily made available
to dataphor. In this sense you can have the best of both
worlds:)

The following quote from the Dataphor Help docs succinctly
conveys Dataphors new paradigm for development, the inherent
limitations historically built into sql and how they are
overcome with the relational model in the context of an
imperative language in a declarative application development
environment.
 
'SQL was designed as a 'database sublanguage.' It was
intended to be used from 3GL programming environments as
a means of retrieving and manipulating data in the database.
As such, it had no facilities for imperative programming.
This implies that application development is done in one
language, while data manipulation is done in another. 
This disparity of language environments is called
impedance mismatch. D4 is really just an imperative
programming language where one of the types of data that
can be described is a table. As a result, applications
can be developed entirely in D4, with no resulting
impedance mismatch.'

I will be concentrating on D4 as it manipulates data in a
relational model as opposed to sql data management.

I hope you find it inspirational and uplifting:)

Dataphor - TITS vs. FITS

Catchy title, huh:)
Ok, one more metaphor before we get down to the nitty gritty.

Dataphor is what I call a TITS:
a Tight Information Technology System 

Tight as in reasoned and concise. Tight as in having full control
over the things it manipulates.  Tight as in logic and mathematics
where there exists no ambiguities. Tight as in algebra where one
thing can be inferred from another.

Sql is what I call a FITS:
a Funky Information Technology System

Funky as in something that functions, but in a slightly strange,
klugey way. Funky as in it does the job and would be difficult to
change, so its obvious non-optimality is left alone. As in the
more bugs something has that nobody has bothered to fix because
workarounds are easier, the funkier it is.

All the loose ends, ambiguities and redundancies of sql are
over come in dataphor. All the twilight of sql becomes clarified
in dataphor. All the gotcha's in sql become tight in dataphor.

I believe you can teach old dogs new tricks.
And I hope I can turn a funky programmer into a tight one.
So if you're tired of throwing sql fits meet a great system
based on TITS.

Wednesday, August 16, 2006

RAC - Sealed with a KISS

Here's a recent question posted in an MS Sql Server newsgroup:

I am trying to formulate a SELECT statement to query the following
table:

CREATE TABLE dbo.PART_TABLE(
  PART_ID VARCHAR(12) NOT NULL
, JOB_NUMBER VARCHAR(12) NOT NULL
, ENTRY_DATE DATETIME NOT NULL DEFAULT GETDATE()
, TYPE_CODE VARCHAR(12) NOT NULL
, PRIMARY KEY (PART_ID, JOB_NUMBER, ENTRY_DATE)
)


What I want to do is to return all rows of the table where the
TYPE_CODE is the same for two consecutive ENTRY_DATEs, for each
PART_ID.

The PART_IDs will not necessarily have a daily entry, so the
'consecutive' dates may actually be a few days apart.

In other words, I want to know the rows where a PART_ID had the same
TYPE_CODE on the chronologically previous ENTRY_DATE as the next.

So, if my table contained:

-------------------------------------------------
1111       30       01/06/06      ZXY
2222       40       01/06/06      ZXY
2222       35       03/06/06      ABC
1111       23       03/06/06      ZXY
3333       87       02/06/06      ABC
---------------------------------------------------

the query would return:
1111       30       01/06/06      ZXY
1111       23       03/06/06      ZXY

as the same TYPE_CODE was used on two consecutive ENTRY_DATEs for
PART_ID '1111'.

A number of very bright sql programmers proposed solutions.
Unfortunately you would need to be an extremely sophisticated
sql expert to understand them. There is a point at which the 
code completely obfuscates the logic of the solution.
Now I'm an advocate of KISS (keep it simple stupid). This was
one of the major motivations for developing the RAC utility.
Questions like the one above can be kissed very easily if
framed properly. At its simplist this problem can be solved
by sorting and computing a few ranks based on the sort.

Here is some same data:

insert PART_TABLE values('1111',30,'01/06/06','ZXY')
insert PART_TABLE values('2222',40,'01/06/06','ZXY')
insert PART_TABLE values('2222',35,'03/06/06','ABC')
insert PART_TABLE values('1111',23,'03/06/06','ZXY')
insert PART_TABLE values('3333',87,'02/06/06','ABC')
insert PART_TABLE values('1111',21,'03/10/06','ZXY')
insert PART_TABLE values('2222',23,'03/21/06','ABC')
insert PART_TABLE values('3333',24,'02/16/06','HUH')
insert PART_TABLE values('2222',27,'01/04/06','ABC')

Here is the data sorted in RAC without any other processing.
Sorting by the sequence of PART_ID, ENTRY_CODE and TYPE_CODE
encapsulates the whole solution.

Exec Rac
@transform='_dummy_',
@rows='PART_ID & TYPE_CODE & ENTRY_DATE(date) & JOB_NUMBER',
@rowsort='PART_ID & ENTRY_DATE & TYPE_CODE',  
@pvtcol='Report Mode',
@from='PART_TABLE',@rowbreak='n',@defaults1='y',
@defaultexceptions='dumy',@racheck='y'

PART_ID TYPE_CODE ENTRY_DATE JOB_NUMBER 
------- --------- ---------- ---------- 
1111    ZXY       01/06/06   30
1111    ZXY       03/06/06   23
1111    ZXY       03/10/06   21
2222    ABC       01/04/06   27
2222    ZXY       01/06/06   40
2222    ABC       03/06/06   35
2222    ABC       03/21/06   23
3333    ABC       02/06/06   87
3333    HUH       02/16/06   24

Now we're going to put a counter/rank on 
TYPE_CODE within PART_ID called TYPERANK.
Every time PART_ID changes or TYPE_CODE within
PART_ID changes the rank is set back to 1.
Otherwise the rank increments by 1. We're also
going to add a first/last indicator for the rank
called LASTYPE. If the rank is not the last combination
of PART_ID/TYPE_CODE it's 0, if its the last a 1.

Exec Rac
@transform='_dummy_',
@rows='PART_ID & TYPE_CODE & ENTRY_DATE(date) & JOB_NUMBER',
@rowsort='PART_ID & ENTRY_DATE &  TYPE_CODE',  
@pvtcol='Report Mode',
@from='PART_TABLE',@rowbreak='n',@defaults1='y',
@defaultexceptions='dumy',@racheck='y',
@rowcounters='TYPE_CODE{TYPERANK}',
@lastcounters='TYPE_CODE{LASTYPE}'

PART_ID TYPE_CODE ENTRY_DATE JOB_NUMBER TYPERANK LASTYPE 
------- --------- ---------- ---------- -------- ------- 
1111    ZXY       01/06/06   30         1        0
1111    ZXY       03/06/06   23         2        0
1111    ZXY       03/10/06   21         3        1
2222    ABC       01/04/06   27         1        1
2222    ZXY       01/06/06   40         1        1
2222    ABC       03/06/06   35         1        0
2222    ABC       03/21/06   23         2        1
3333    ABC       02/06/06   87         1        1
3333    HUH       02/16/06   24         1        1

Now using TYPERANK and LASTYPE it's easy to pick off
the 1st 2 consecutive rows for PART_ID and TYPE_CODE.
The @wherecounters parameter makes this simple test.
Conceptually all this is done with a single pass thru
the data.

Exec Rac
@transform='_dummy_',
@rows='PART_ID & TYPE_CODE & ENTRY_DATE(date) & JOB_NUMBER',
@rowsort='PART_ID & ENTRY_DATE &  TYPE_CODE',  
@pvtcol='Report Mode',
@from='PART_TABLE',@rowbreak='n',@defaults1='y',
@defaultexceptions='dumy',@racheck='y',
@rowcounters='TYPE_CODE{TYPERANK}',
@lastcounters='TYPE_CODE{LASTYPE}',
@wherecounters='LASTYPE=0 or TYPERANK=2'

PART_ID TYPE_CODE ENTRY_DATE JOB_NUMBER TYPERANK LASTYPE 
------- --------- ---------- ---------- -------- ------- 
1111    ZXY       01/06/06   30         1        0
1111    ZXY       03/06/06   23         2        0
2222    ABC       03/06/06   35         1        0
2222    ABC       03/21/06   23         2        1

Kiss/RAC which also stands for less is more.

Hopefully this has been helpful. Or are we back to 'huh'?:)

Sunday, August 13, 2006

Dataphor - Mixed Metaphors

What do you call bright people who become experts 
in a silly exercise? Like bright people who become
expert sql programmers. They are the ultimate hair
stylists. They spend so much time and brain power
splitting hairs that we actually feel sorry for them
and reward their persistance with the title of 'expert':) 
Just like we rewarded the guys who spent a year building
the ship model 'inside' the glass bottle. They were the
experts until someone came along who showed that it was
so much easier to build the model first and then glue
each side of glass around the ship. Viola, ship in a
bottle. Dataphor is the new way to build a ship in a
bottle. And you can build a much better ship since you
don't have to worry about a glass ceiling. 

Saturday, August 12, 2006

Dataphor - Solving the Humpty Dumpty Problem

Humpty Dumpty sat on a wall,
Humpty Dumpty took a big fall,
All the kings horses and all the kings men,
Couldn't put Humpty together again.

It seems people are always running into the same problem
with concatenating strings in sql. But we can put Humpty
together very easily in Dataphor.

Lets concatenate the strings that we created in the banana split
post. First we'll create a table and insert the split strings.
This is the familiar sql operation of insert/select.

create table exampleconcat
 { ID      : Integer,
   index   : Integer,
   strpart : String,
   key {ID,index}
 } 

insert
  (
   examplesplit times numbers       
    where Split(STRING,{ ':',',','/' } ).Count()>DIGIT
     add {Split(STRING,{ ':',',','/' } ).Count() cnt,
          Split(STRING,{ ':',',','/' } )[DIGIT].Trim() strpart}
          {strpart,DIGIT+1 index,ID}     
   )   
     into exampleconcat

But wait a minute will ya. Notice there's no 'select'. That's because
in sql 'select' is part of all expressions. In Dataphor it's only a
statement. Not only is its meaning logical, it;s intuitive. In Dataphor you
you can order 'select' a la carte, not so in sql:) This is a BIG deal
as you shall see as we get more involved with queries.

An even BIGGER deal is that Dataphor inserts data independent of ordinal
position. It's only concerned with matching the names of the columns.
The columns to be inserted specified by:

{strpart,DIGIT+1 index,ID} 

could be scrambled any way you want. The order has no relationalship to
the sequence of columns declared in table 'exampleconcat'.In all the
places that sql requires taking into account the order of columns, you
needn't concern yourself about it in Dataphor.

select exampleconcat          
ID index strpart
-- ----- -------
1  1     AA     
1  2     BB     
1  3     CC     
1  4     DD     
1  5     EE     
1  6     FF     
1  7     GG     
2  1     KK     
2  2     JJ     
2  3     II     
2  4     HH     
2  5     AA     
2  6     BB     
3  1     LL     
3  2     MM     
3  3     NN     
3  4     PP     
4  1     QQQ    
4  2     RRRR  

To concatenate the strings for each ID is childs play:) The concat
operator is like an aggregate function,ie. count,min,max etc. You
group by ID and add the concat operator. It takes 3 arguments,the
string column, a delimiter and an order by column for how you want
the result ordered.

select
 exampleconcat 
  add {',' Delimiter}
   group by {ID}
    add { Concat(strpart,Delimiter order by {index}) concatstr }
   
ID concatstr           
-- --------------------
1  AA,BB,CC,DD,EE,FF,GG
2  KK,JJ,II,HH,AA,BB   
3  LL,MM,NN,PP         
4  QQQ,RRRR        

Are you impressed?:)

Before I let you go a few words about add. Yeah, it's a BIG deal:)

Consider this. Lets create a table of 1 row and a single column X
with a value of 5.

select ToTable({5},'X')

X
-
5

Now add another column to the table, Y.

select ToTable({5},'X') add {6 Y}

X Y
- -
5 6

Now add the expression X*Y.

select ToTable({5},'X') add {6 Y} add {X*Y XY}

X Y XY
- - --
5 6 30

Now add an expression involving XY and Y.

select ToTable({5},'X') add {6 Y} add {X*Y XY} add {XY-Y Z}

X Y XY Z 
- - -- --
5 6 30 24

See a pattern here? Remember high school algebra?:) Well this is algebra
with tables! You know the problem that sql has with using expressions
(aliases) within the same select? Gone. The often repeated sql question:
'why can't I use an alias defined in 'select' in the 'where' statement?'

select ToTable({5},'X') add {6 Y} add {X*Y XY} add {XY-Y Z}
where Z>20

Gone. Fini. Case closed:)

Compare this to t-sql.

select X,Y,XY,XY-Y Z
from
 (select X,Y,X*Y XY
from
  (select 5 as X,6 as Y) t1
                       ) t2
                      
You have to select each expression before you can use it in another
expression. Give me a break:) What is the mathematical concept that
describes this sql behavior? Mess, from the latin idea of messi?:)
Did someone ask about subqueries? You guessed it, you add subqueries
just like any other expression. And another post. And yes I could
talk about the sql need of table aliases (t1,t2) but I feel
magnanimous now so I won't beat a dead horse:)

Hang with me for just a few more seconds I want to show you
something really interesting. Lets suppose in our example we want
ID's with a string count of greater than 4. First we'll add the
Count() aggregate to the concat aggregate. We'll call the count 'cnt'.

select
 exampleconcat 
  add {',' Delimiter}
   group by {ID}
    add { Concat(strpart,Delimiter order by {index}) concatstr,Count() cnt }

ID concatstr            cnt
-- -------------------- ---
1  AA,BB,CC,DD,EE,FF,GG 7  
2  KK,JJ,II,HH,AA,BB    6  
3  LL,MM,NN,PP          4  
4  QQQ,RRRR             2

To get the ID's with a count greater than 4 you would use 'having' in
sql right? As in 'having count()>4'. This gets to the fact that in
sql there are 2 different statements that perform the same conceptual
thing ie. a restriction. And the one you use depends on whether your
grouping ('having') or operating on non grouped data ('where'). The
usual reason given when newbies ask about this is 'thats the way
it works in sql'. Good thing the designers of this sql mess didn't
go into medicine:)
Now in Dataphor after the 2nd add we have a table. How do you
restrict the rows of a table? Yep with where. One where fits
all(situations):) There is no logical reason for 'with this use that,
with that use this'. So it's as simple as this:

select
 exampleconcat 
  add {',' Delimiter}
   group by {ID}
    add { Concat(strpart,Delimiter order by {index}) concatstr,Count() cnt }
     where cnt>4
      order by {cnt desc} 
     
ID concatstr            cnt
-- -------------------- ---
1  AA,BB,CC,DD,EE,FF,GG 7  
2  KK,JJ,II,HH,AA,BB    6

Since we're we using an algebra for tables we can add parenthesis
around the statements that make up the table we're applying where to:

select
 (
   exampleconcat 
    add {',' Delimiter}
     group by {ID}
      add { Concat(strpart,Delimiter order by {index}) concatstr,Count() cnt }
  )    
       where cnt>4
        order by {cnt desc}

Ok, thanks for sticking around. There's much more to come!

Dataphor - a Banana Split

A common problem in sql is to split a delimited string without
using cursors or any type of procedural code. Talk about how
to have fun!:) And to spice it up, how about multiple delimiters.
Ok here's an example of splitting a string over rows with multiple
delimiters, colon(:), comma(,) and slash(/). The column 'STRING'
is what we want to tear apart over 'ID'. You will love this!

// Keywords are in bold.
select 
table
 {
 row {1 ID, 'AA: BB, CC/ DD, EE/ FF: GG' STRING},
 row {2,'KK, JJ: II, HH, AA: BB'},
 row {3,'LL : MM , NN / PP'},
 row {4,'QQQ / RRRR'},
 key {ID}
 }
  times 
   ToTable({0,1,2,3,4,5,6,7,8,9,10},'DIGIT')     
    where STRING.Split( { ':',',','/' } ).Count()>DIGIT
     add {STRING.Split( { ':',',','/' } ).Count() cnt,
          STRING.Split( { ':',',','/' } )[DIGIT].Trim() strpart}
          {ID,strpart,DIGIT+1 index,cnt} 
        order by {ID,index }

ID strpart index cnt
-- ------- ----- ---
1  AA      1     7
1  BB      2     7
1  CC      3     7
1  DD      4     7
1  EE      5     7
1  FF      6     7
1  GG      7     7
2  KK      1     6
2  JJ      2     6
2  II      3     6
2  HH      4     6
2  AA      5     6
2  BB      6     6
3  LL      1     4
3  MM      2     4
3  NN      3     4
3  PP      4     4
4  QQQ     1     2
4  RRRR    2     2

Lets go over this. I promise it won't take long, it's that simple.
First we'll get over the clutter of the virtual tables and create
some permanent ones. First we'll create table 'examplesplit'.

create table examplesplit
from
 table
 {
 row {1 ID, 'AA: BB, CC/ DD, EE/ FF: GG' STRING },
 row {2,'KK, JJ: II, HH, AA: BB'},
 row {3,'LL : MM , NN / PP'},
 row {4,'QQQ / RRRR'},
 key {ID}
 }
  adorn  STRING tags {Storage.Length = "100"}}

select examplesplit
ID STRING                 
-- --------------------------
1  AA: BB, CC/ DD, EE/ FF: GG
2  KK, JJ: II, HH, AA: BB 
3  LL : MM , NN / PP      
4  QQQ / RRRR

Don't worry about adorn, I'll come back to that in due time.
Now create a table of digits called 'numbers' with a column 'DIGIT'.

create table numbers
from
 ToTable({0,1,2,3,4,5,6,7,8,9,10},'DIGIT')

select numbers
DIGIT
-----
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10

Now here we go with 2 versions of the new query to grap hold of
of you regardless of your inclination:)

select 
 examplesplit times numbers    
    where STRING.Split( { ':', ',','/' } ).Count()>DIGIT
     add {STRING.Split( { ':', ',','/' } ).Count() cnt,
          STRING.Split( { ':', ',','/' } )[DIGIT].Trim() strpart}
          {ID,strpart,DIGIT+1 index,cnt} 
       order by {ID,index}
select 
 examplesplit times numbers    
    where Split(STRING,{ ':',',','/' } ).Count()>DIGIT
     add {Split(STRING,{ ':',',','/' } ).Count() cnt,
          Split(STRING,{ ':',',','/' } )[DIGIT].Trim() strpart}
          {ID,strpart,DIGIT+1 index,cnt} 
       order by {ID,index}

ID strpart index cnt
-- ------- ----- ---
1  AA      1     7
1  BB      2     7
1  CC      3     7
1  DD      4     7
1  EE      5     7
1  FF      6     7
1  GG      7     7
etc.

The 'STRING.Split' is for those that are inclined to believe in
UFO's (objects) and their modus operandi of abductions (methods).
The 'Split(STRING' is for those that have a history of renditions
at the hands of sql.

Now 'examplesplit times numbers' is like sql's 'cross join'. Now
comes the kewl part. The Spit thingie splits a string so easily
because it consumes (as in eats up) the string as a 'list'. This
is a BIG deal. When is a door not a door? When it's a jar:) When is
a 'string' more than itself, when it's a 'list':) So with knowledge
of the delimiters Split is operating on a 'list'. If you have a list
you should know how many items are in it. And you do with Count():

STRING.Split( { ':',',','/' } ).Count()
Split(STRING,{ ':',',','/' } ).Count()

Lists are 0 based. So for ID 1 there are 7 strings in the list.
The elements in the list are therefore 0-6. So the where restriction
forms rows for each ID that have DIGIT less than the Count(). And
DIGIT can be used as an index into the list to return a particular
string, ie. 'strpart'. So to the table formed by times and where, for
each row we add the Count() as column 'cnt' and a particular string 
as column 'strpart' by using DIGIT in either of the expressions:

STRING.Split( { ':',',','/' } )[DIGIT].Trim() strpart
Split(STRING,{ ':',',','/' } )[DIGIT].Trim() strpart

You will see that this [] thingie, called an indexer, can be used not
only on lists but also tables whatever form they take. This construct
is crazy kewl and I'll show just how kewl and powerful it is as we
work our way thru Dataphor. Trust me:)

The expression surrounded by the swigglies:

{ID,strpart,DIGIT+1 index,cnt}

is used to return the columns you want and also to define new columns
which is what 'index' is.

By the way did it register that this is exactly the opposite of sql!
In sql you state what you want returned before you even tell it what
to process. Here we process stuff and after we're finished we can
pick and choose what we want. In sql it's like going to a resturant
and asking for the check before you've eaten let alone even ordered!
Waitress:'Welcome to Alice's Restaurant sir'. Sqler:'Hi, check please'.
Waitress thinking:'what a nitwit':) This is a big deal in writing
queries as you will see.

And did you notice that there is no 'from' in 'select'! It's not there
because you don't need it! It's redundant. Why do you have to write
'from' and the table. Just stating the table name should be sufficient.
Writing both is like paying for the same piece of real estate twice.
Come on, this isn't brain surgury it's just common sense and logical.
And Dataphor is filled with both:)

As for order by...well guess:)

I hope your enthused about Dataphors banana split compared to the
splitting headache in sql:)

Stay tuned:)

Thursday, August 10, 2006

First Sql now Dataphor

Yeah, come on all of you, big bright men,
Data management needs your help again.
He's got himself in a terrible jam
Way down yonder in sql land
So put down your server and pick up some rum,
We're gonna have a whole lotta fun.

To paraphrase William F. Buckley Jr.'s quote from Austrian Willi Schlamm:
The trouble with sql is sql.
The trouble with database management is sql.

I'm an advocate of something wonder-filled called 'Dataphor' (www.alphora.com)
If your into database management, data modeling and sql programming
I have big news for you. So if your working with MS Sql Server, Oracle,
DB2 etc. etc. here's a news flash:
You've have been waiting for Dataphor forever but you just don't know it!

Take a gander at these 'select' statements:

Dataphor supports a table 'type'. So you can represent a
table in a 'select' statement using keywords that define
a table, ie. row, key and of course table. 

select
table
{ row {1 ID,10 ID2,'AA' stringvalue},
row {1,20,'BB'},
row {1,30,'DD'},
row {1,40,'DD'},
row {1,50,'EE'},
row {2,60,'DD'},
row {2,70,'AA'},
row {2,80,'EE'},
row {2,90,'BB'},
row {2,100,'BB'},
row {2,110,'FF'},
key {ID2}
}

ID ID2 stringvalue
-- --- -----------
1  10  AA
1  20  BB
1  30  DD
1  40  DD
1  50  EE
2  60  DD
2  70  AA
2  80  EE
2  90  BB
2 100  BB
2 110  FF

A rose is a rose is a rose. Tables in Dataphor are
'always' a table, ie. a row is always unique and
the table has a key. If a key isn't specified all
the columns of the table will make up the key.
Therefore there is no DISTINCT keyword in Dataphor.
This is a 'BIG,BIG' deal as you will see.

select
(
table
{ row {1 ID,10 ID2,'AA' stringvalue},
row {1,20,'BB'},
row {1,30,'DD'},
row {1,40,'DD'},
row {1,50,'EE'},
row {2,60,'DD'},
row {2,70,'AA'},
row {2,80,'EE'},
row {2,90,'BB'},
row {2,100,'BB'},
row {2,110,'FF'},
key {ID2}
}
) over {ID,stringvalue}

The above query returns only ID and stringvalue columns using over.
Do you see that duplicate rows are eliiminated.
You sent in a real table, you got a real table returned.

ID stringvalue
-- -----------
1 AA
1 BB
1 DD
1 EE
2 AA
2 BB
2 DD
2 EE
2 FF

Now lets return ID by removing the other columns.
Why doesn't sql have a 'remove', especially when you  
want to get rid of one column and you got a bunch.

select
(
table
{ row {1 ID,10 ID2,'AA' stringvalue},
row {1,20,'BB'},
row {1,30,'DD'},
row {1,40,'DD'},
row {1,50,'EE'},
row {2,60,'DD'},
row {2,70,'AA'},
row {2,80,'EE'},
row {2,90,'BB'},
row {2,100,'BB'},
row {2,110,'FF'},
key {ID2}
}
)
remove{ID2,stringvalue}

ID
--
1
2

Get it! :) 
Ok here's a little more nudging. Think of the verb to 'dup', as in
'de-duping' a table using GROUP BY, JOINS, DISTINCT. All that craziness
of getting rid of duplicates is history. Good riddance. And note that
'dup' can also be an adjective:)

Do these predicates look familiar:

=, <>, <, <=, >, >=,

well you can now apply them to tables! Is this a step forward?
What do you think!:)

Next some kewl examples of how to do stuff that usually drives you nuts.

About Me

My photo
Phoenix, Arizona, United States