Dataphor SQL RAC (Relational Application Companion)


A site of hope for those looking for a true relational database system

Sunday, December 09, 2007

Q & A about tables, types and procedures

Here are some questions from an sql developer I'm trying to 
answer. I'm trying to explain that a relational database realizes 
a stored procedure in a fundamentally different way than sql. 
Along the way I'm talking about relational ideas as they relate
to tables and types. The conversation shows the vise-like grip
sql seems to have on the minds of man  Microsofts functional
programming offering LINQ also comes up. Overall I think I did
a good job trying to explain the big differences between sql
and a relational database. There's some simple but good examples
that drive home the significant differences.
Black is relational(me)
Purple is sql

Here we go:
> My objection is not so much to your general idea of variables of
> type table-with-given-columns (I've recently worked with some systems
> that could be cleaner if such a thing were available; currently they
> work around it using temp tables); more to your specific use of D4 in
> all your examples, as opposed to a pseudo-code extension of SQL.

Ok, let me directly address you dislike of D4 and your preference for
a pseudo-code extension of SQL. I'll refer to your pseudo-syntax in
the thread:
comp.databases.ms-sqlserver
'Basic Anatomy of Sql Server'
http://tinyurl.com/2olako

>
Then you might want to write examples in a pseudo-syntax that
/looks/ like SQL.  I know this is a matter of taste, but your
examples look ugly to me.  Consider:

-- Your example of a stored procedure that returns a result set, the
-- format of which can only be deduced by reading through the code.

CREATE PROCEDURE dbo.GroupByShipCountry
       @Employee Integer
AS
SELECT ShipCountry,Count(*) Cnt,Min(Freight) MinFrt,Max(Freight)
MaxFrt
FROM Orders
WHERE EmployeeID=@Employee
GROUP BY ShipCountry

-- Your example of the same stored procedure rewritten in D4.

create operator GroupByShipCountry (Employee:Integer):
table{ShipCountry:String,Cnt:Integer,MinFrt:Money,MaxFrt:Money}
begin
result:=
       Orders
        where EmployeeID=Employee
         group by {ShipCountry}
           add{Count() Cnt,Min(Freight) MinFrt,Max(Freight) MaxFrt} ;
end;

-- My example of the same stored procedure rewritten in a
-- pseudo-extension of T-SQL.

CREATE PROCEDURE dbo.GroupByShipCountry
       @Employee Integer,
       @ResultSet Table (
             ShipCountry varchar(15),
             Cnt         int,
             MinFrt      money,
             MaxFrt      money
       ) output
AS
SELECT ShipCountry,
        Count(*) Cnt,
        Min(Freight) MinFrt,
        Max(Freight) MaxFrt
INTO @ResultSet
FROM Orders
WHERE EmployeeID=@Employee
GROUP BY ShipCountry
>

To begin with, the idea of a stored procedure returning a 'result' is
an sql concept. This concept does not exist in a relational (D4)
system. Relationally, a stored procedure only exists when it is
created. The  execution of a sp, its runtime realization, does not
involve the definition of the procedure nor the idea of 'returning'
something from it. Relationally at runtime what sql see's as a
procedure and a result 'is' a variable of the type of the result. This
is the huge difference between the two systems. Relationally the
'@ResultSet' and the idea of inserting a query result into it is
contradictory and meaningless. The 'name' of the procedure 'is' the
variable (table), there is no result from a sp (ie. sql). Syntactically
an sql tabled value function is closer in spirit to the D4 procedure
with the big difference that the name of the table valued function is
'not' a typed variable like in D4. Finally, the sql sp makes the
distinction between identifiers as variables and non-variables using
the '@'.  In D4 there is no such distinction as 'all' identitifiers
are by definition variables and the '@' is superfluous. The 'output'
declaration in the sql sp is based on the general sql idea of
'returning' something. Such a declaration is superfluous relationally
as, again. there is no concept of 'returning a something' from a 'this sp'.
Note that LINQ realizes an sql stored procedure exactly as sql intends
it and nothing like the relational D4. The 'functional' part of
integrated query is simply how the sp is accessed within a net
language. There is no concept of a typed variable with the name of the
sp. In other words, the OR map is mapping to the same sql as if the
mapping didn't exist. MS has added a 'functional language' within net
when what it should have done is added a functional language to the
database itself!  D4, after all, represents the functional language
of a relational system and how easy it is to use such a language
within a present day (net) programming language. The D4 answer to
overcoming the object-relational mismatch is of a totally different
nature to the one offered thru LINQ. My objection to LINQ lies in the
idea that no one at MS seems to have considered an alternative. 

Let me try to cut thru the semantics with a few examples that I
hope will clarify some of the points I'm trying to make.

In t-sql this should be perfectly clear:
DECLARE @X INT
SET @X=5

The variable @X can only take one value at any specific time.
In a relational system a procedure that returns some value at runtime
must behave exactly like @X. At runtime the procedure is a variable of
a particular type and has a specific value based on input arguments.
An sql sp has no such nature and behaves in an entirely different way.

create procedure SqlOne
@Y int
AS
SELECT COUNT(*) AS CNT,SUM(FREIGHT) AS SUMFRT
FROM ORDERS
WHERE EMPLOYEEID=@Y
SELECT *
FROM ORDERS
WHERE EMPLOYEEID=@Y

Therefore the idea that an sql procedure can return multiple results
is meaningless if the sp is realized as a variable where only a single
result makes sense. Add to this the idea of type where each result is
a different type and the difference between sql and relational should
be even clearer. Again the relational procedure is realized exactly
like the int @X. No programming language chooses among possibe
multiple definitions of the value of a variable. It would be
equivalent to:
DECLARE @X INT
SET @X=5 or @X=10
which makes no sense. It is because the sql sp is not realized as a
variable that multiple results 'can' be returned.

This sql sp:
CREATE PROCEDURE SqlTwo
@A INT OUTPUT,
@B INT OUTPUT
AS
SET @A=5
SET @B=10

DECLARE @C INT,@D INT
EXEC SqlTwo @C OUTPUT,@D OUTPUT
SELECT @C
SELECT @D

makes no sense relationally because, again, there are multiple
results. Now there are two scalar types (int) returned instead of sql
'resultsets'. Relationally there is no such thing as more than 1 thing
(think a variable of a type) at a time. Two scalar results are
realized as a 'row' type relationally, ie. 'one' thing.
create operator D4Two():row(A:Integer,B:Integer)
begin
result:=row(5 A,10 B);
end;

In this case at runtime D4Two is a variable of type row with 2 scalar
columns.

From the relational perspective a table/row/list is a variable that
behaves exactly like a variable in a programming language. Its value
can be assigned to other values just like a t-sql integer variable
can. It can be compared to other variables (for equality) just like a t-sql
integer variable. It can be passed as an argument to a procedure just
like a t-sql integer variable. For these reasons why MS decided to
call something a 'table variable' remains a mystery. It behaves like no
other variable in any programming language on the face of the
planet  If it doesn't quack like a duck, doesn't behave like a duck,
doesn't waddle like a duck it sure as hell isn't a duck  What MS
calls a table variable is surely not a table variable as the idea
exists in any programming language or relationally. Whatever one wants
to call an sql table the table variable is the same thing. Its
phyiscally implementation may be different but that does not change
the fact it is not a variable of a specific table type.

Sql distinguishes between user defined functions and procedures. But
sql user defined functions are on the same exact level of procedures
when looked at from the point of view of 'variables'. Neither one
has anything to do with the idea of a relational variable. All this
artificial distinction does is serve to make it harder for users to
understand the relational model  (Why sql choose to create a user
define function/procedure dichotomy is another topic. But think of
where and having).

Rather than dwell on particular syntax or pseudo-syntax I think it
is the ideas that the relational model is based on that is important.
And what we're talking about here is just a slice of the relational
model. The relational model is not rocket science  It's actually
quiet straightforward. Ironically it's sql that is out in left field.
The relational model is in line with all current programming
languages. Unfortunately that's never been the case with sql  This is 
one of the reasons I find LINQ so unnecessary. Once you get the idea
that a big part of the relational model is all about the basic concepts of
variables and types I think (I at least hope) that what I've been
trying to explain will make perfect sense  

>Add to this the idea of type where each result is a different type
>In other words, TABLE (X INT, Y INT) is a different variable type
>from TABLE (M VARCHAR(15), N VARCHAR(15))?

Exactly. Think of sql strings. This table, TABLE(TABLE (M VARCHAR(15),
N VARCHAR(15)),
is a differnt type than TABLE (N VARCHAR(16), N VARCHAR(16))! This
means that we couldn't compare the two and undermines real relational
division. To declare how many characters in a string is clearly the
opposite of what the relational idea of data independence is all
about. Relationally there can only be a 'string' type having
absolutely nothing to do with its storage characteristics.  And this
is the same idea in any programming language. This is just one
manifestation of how sqls design ignores the concept of a strong type.
LINQ is an attempt to hide the fact that:
DECLARE @N VARCHAR(5),@M VARCHAR(6),@P VARCHAR(5),@Q VARCHAR(6)
represents 4 different types. This is but one simple form of the idea
that sql guarantees impedence mismatch! And having the choice of
changing the database or the access to it, MS chose access to the
database (LINQ). The sql community seems to not consider how bizarre
and confounding things like this look to developers coming to sql for
the first time. It must be force of habit blinding a more critical
look at how things are. There are so many that sql defeats the idea of
strong types that it would be better and easier to build a new
database system for application development. And have sql available
for everything else 

>What if the data you want to return is not multiple scalars, but
>rather multiple tables?  Upon reflection, I suppose tables could
>be nested in this model, i.e. you can return
>   TABLE(T1 TABLE(X INT, Y INT), T2 TABLE(M VARCHAR(15), N VARCHAR(15)))

The relational database emphasizes types to define structures that can
be used to model processes, ie.

create table T1
{
  A:Integer,
  B:String,
  T:row{X:Integer,Y:String},
  S:row{B:Integer,C:list(String)}
  key{A}

};

This is possible thru system provided types and user defined types.
It's also supported thru explicit conversion processes between one
particular type and another. The foundation to support these
constructs is unique to a relational system and does not exist in sql.
Whether a specific struture makes logical sense and whether it's
supported is another matter. Tables within tables is open to question.
Even if the system supports it would it make sense or would another
type of structure be more appropriate? This question goes to the edges
of a relational system and I'm afraid I can't do it justice here 

>This would allow bad developers to commit the common 'a,b,c' 1NF
>violation in a whole new way, but then bad developers can screw
>things up in any language.

The view that strings like 'a,b,c' violate the idea of the atomicity
of a column in an sql table is a direct result of sql's lack of types
and lack of relationships between types. There is no violation of any
kind in a relational system because the string can be stored as
a single value of a column retaining the concept that there individual
elements involved. It would simply be stored as a 'list' type.
For example column B of table TableList takes a comma delimited string
and splits it into a list type.

create table TableList
from
table
     {
      row{1 A,'A,B,C'.Split({','}) B},
      row{2,'D, E , F, G'.Split({','})},
      row{3,'H,I,J,K'.Split({','})}
     };

The table definition of TableList is:
create table TableList
A:Integer,
B:list(String),
key{A}

For each row of the table column A can be retrieved and the individual
items of column B, the list, are availiable.
Select the value of A and the value of the 1st item in the
list(B).
select TableList {A,B[0] ListItem1};
A ListItem1
- ---------
1 A
2 D
3 H

Directly address the 1st item in the list of A=2.
select TableList[2].B[0];
D

Directly address the last item in the list of A=3.
select TableList[3].B[TableList[3].B.Count()-1];
K

Get column B for the row where A=1 and convert the list into a table.
select ToTable(TableList[1].B,'Str','Index');
or
select ToTable( (TableList where A=1 over{B})[].B, 'Str','Index');
Str Index
--- -----
A   0
B   1
C   2

It's types that a relational system guarantees integrity for and high
level operators that allow the explicit conversions between that
developers should have for application development. And this is the
same idea the MS net team calls 'functional programming' which is what
they developed LINQ for. But a relational system is functional
programming! 

>bad developers can screw things up in any language.

Sure but application development with sql has a tendency to make
anyone a nitwit at some time or other. I'm for less nitwits 

>> 'where' and 'having').
>I'd guess these are both for efficiency.

GROUP BY was added after the original SELECT but instead of
redesigning the language they saw no problem with leaving in two
constructs that do the same thing! Every time I see an MS paper on
'best practices' I have got to grin 

On table variables:

>But you agree that (1) it has some features of variables, and (2) it
>could reasonably be extended to have more features of variables?

I don't think MS could lock its developer army in a hotel and tell
them to make sql a little more relational  They have two choices.
Either buy a relational system (like D4) or start from the ground up
to develop one. The gulf between a relational system and sql is too great
to try to simply make changes in sql server. 

>A lot of people find SQL pretty straightforward, especially in this
>newsgroup.  Your choice of (pseudo-)syntax will make a difference to
>them.  (You might get different responses from a newsgroup focusing
>on front-end programming languages, especially if they already
>resemble Pascal as D4 seems to do.)

Sure at least a dozen people who write books and articles find sql
straightforward  For the rest I'd like to at least see a choice for
them. Again, I hope some can get beyond just syntax to grasp what
a relational system would offer. 

No comments:

About Me

My photo
Phoenix, Arizona, United States