Beyond Sql: Dataphor - Are You Constrained?

No where can the essence of a business problem or model be better
shown than thru the means of constraints. Constraints encapsulate
the logic of a model and govern the ebb and flow of an application.
Constraints in Dataphor, based on the relational model, offer these
benefits:

    * Emphasize data independence
    * The ability to reference global objects
    * The use of the declarative method in application development

The relational model allows and fosters the clear separation of
how a problem is modeled versus physical implementation details.
This is the relational concept of data independence. The solution
of a business problem should not be encumbered by anything that
detracts from the ability to clearly and precisely model it. This
is the essence of the concept of the logical model in relational
theory. Yet data independence is sorely lacking in sql where the
line of distinction between the logical model and performance
considerations are too often blurred. The most blatant example
is the conceptual entanglement of keys and indexes. A key, which
is a constraint, is a logical consideration. An index is a physical
consideration. By allowing indexes based on performance considerations
to directly enter the logic of the model the clarity of expression
that is needed becomes undermined or even lost. The price for a gain
in performance is paid for by the ambiguity introduced into the
model. Constraints in dataphor can be clearly expressed independent
of performance based concepts, ie. indexes.

In order to define constraints that need to reference global objects
like tables, sql needs to rely on mechanisms like triggers. All
to often this leads to problems with declared referential integrity
constraints and adds an additional layer of complexity to the
problem. Unlike simple column constraints or indexes, sql does
not allow a table constraint to be simply and clearly stated in a
'create table' or 'alter table' statement. Dataphor allows this
simple but powerfully expressive capability.

In sql the idea of metadata is independent of the logical constructs
used to derive data. In other words, in sql you can declare an
index but you can't declare any additional information about the
index. Dataphor allows metadata, information about the object, to
be declared along with the object. This additional information is
used in a declarative way to make application development easier.
Dataphors derivation engine thus absolves the developer of an
entire level of work. For example, by declaring metadata about
a constaint the meaning of the constraint can be clarified
while at the same time saving the developer the work to do this.

An interesting example of the confluence of constraints
(the logical model) and performance can be found in the article

'Indexes on Computed Columns: Speed Up Queries, Add Business Rules'

which can be viewed here. Very cleaver solutions to constraints
are proposed in the form of indexes on computed columns. Compare
this approach to simply, clearly and explicitly defining a
constraint with metadata in dataphor leaving performance optimization
as the last step of the application.

For example, one problem the article discusses is insuring that VPN
tokens are unique in the table while allowing any number of null VPN
token numbers.

I'll be using dataphor with MS sql server 2005 sp1 as the data
repository (backend).

Here is the table definition defined in dataphor:
(All dataphor keywords are in bold)

create table TESTC
 {
 EMPLOYEEID : Integer,
 LOCATIONID : String,
 DEPTID : String { default 'ONE' },
 VPNTOKEN : Integer { nil } ,
 key {EMPLOYEEID}
  }

The primary key is EMPLOYEEID. The 'Integer { nil }' is the dataphor
way of saying that VPNTOKEN can take a null (no value) as the data
is stored in sql server.

A table constrain can now be declared that captures the logic of
uniqueness for non null VPNTOKENs and allows unlimited nulls for it.
The constraint can be declared using the dataphor alter table.

alter table TESTC
 {
  create constraint DISTINCTVPNWITHNULLS
   exists(
          TESTC
           group add {Count(VPNTOKEN) CNT,Count(distinct VPNTOKEN) DISTVPNCNT}
             where CNT=DISTVPNCNT
          )
     tags
 {
     DAE.Message =
      "'VPNTOKEN ' + ToString(VPNTOKEN) + ' is not unique'"
 }   
  }      

The 'Count(VPNTOKEN)' and 'Count(distinct VPNTOKEN)' work the same as
they do in sql. Both count constructs eliminate null values. So the
where predicate is comparing the count of non nulls to the distinct
count of non nulls. The group statement has no by statement so
the comparison is over the whole table. The logic of the constraint
says if where is true then all the non nulls must be unique. If where
is false an attempt was made to enter an existing non null VPNTOKEN
and thus violiates the uniqueness of VPTOKEN in the table and the
constraint. The attempt to enter such a VPNTOKEN will be rejected.
Since where is independent of nulls, all null values for VPNTOKEN
will satsify the constraint (it evaluates to true) and will be
accepted. The only way a null VPNTOKEN will be rejected is if the
row being inserted violates the uniqueness of the key of the table
which is EMPLOYEEDID. This is the meaning of specifying a key. Only
unique values of EMPLOYEEID are acceptable.

The tag keyword indicates metadata is supplied with the constraint
definition. Dataphor will use the message when the constraint
is violated as opposed to a cryptic message returned by sql server
when a key/index is violated.

Lets insert some data with unique VPNTOKENs

insert
     table
      {
        row { 1 EMPLOYEEID,'A' LOCATIONID,'ONE' DEPTID, 1 VPNTOKEN },
        row { 2,'A','ONE',2},
        row { 3,'B','TWO',3}
      }
        into TESTC;
      
Insert some rows with null VPNTOKEN.

insert
     table
      {
        row { 4 EMPLOYEEID,'C' LOCATIONID,'ONE' DEPTID},
        row { 5 EMPLOYEEID,'C' LOCATIONID,'THREE' DEPTID}
      }
       into TESTC          
     
This is the data in the table.

select TESTC

EMPLOYEEID LOCATIONID DEPTID VPNTOKEN 
---------- ---------- ------ ----------
1          A          ONE    1        
2          A          ONE    2        
3          B          TWO    3        
4          C          ONE    <No Value>
5          C          THREE  <No Value>

Now in the derived form that dataphor has available for every table
and view we can try to enter a row such as:

 row { 7 EMPLOYEEID,'C' LOCATIONID,'THREE' DEPTID,2 VPNTOKEN}

Dataphor uses the tag metadata expression to clearly show the user
the problem with the specific data that violated the constraint:



I submit this is a significant improvement over the type of message
returned by sql server which tells you what object (ie. key)
caused the violation but gives no information about the 'data'
which caused it. The developer is left with this task on the frontend.

Any table constraint should be a straightforward short process to
represent. As another example, this constraint allows unique VPNTOKENs
but at most 1 null value.

alter table TESTC
 {
  create constraint DISTINCTVPNWITHONENULL
   exists(
           TESTC
            group add {
                       Count() TOTALCNT,
                       Count(VPNTOKEN) CNT,
                       Count(distinct VPNTOKEN) DISTVPNCNT
                      }
              where 
                (TOTALCNT-CNT < 2) and (CNT=DISTVPNCNT)
          )
     tags
 {
             DAE.Message =
              "'VPNTOKEN ' + IfNil(ToString(VPNTOKEN), 'NULL') + ' is not unique'"
 }   
  }   

The where statement uses '(TOTALCNT-CNT < 2)' to check that there is
at most 1 null value (there can be 0 or 1) and uses '(CNT=DISTVPNCNT)'
to insure that the non null values are unique. With a violation, the
message will display a VPNTOKEN value that already exists in the table
including 'NULL'.

Dataphor by default will display an informative message for any key
that is violated. It is not necessary to explicitly state any
message. On sql server a primary key violation displays only the
object of the violation. For example:

'Cannot insert duplicate key row in object 'dbo.testc' with unique
index 'UIDX_TESTC_EMPLOYEEID'.
The statement has been terminated.'

While dataphor will display the specific data that caused the
violation:



In the future I'll talk about other types of constraints in
dataphor such as column and transition row constraints.

Bye for now.
Sunday, September 10, 2006

Dataphor - Are You Constrained?

No comments:

Beyond Sql

Blog Archive

Links

About Me