What is really wrong with the "UPDATE.. FROM.." syntax?
This example uses MS Sql Server 2005
and
the D4 language of Dataphor @
www.alphora.com
The discussion is based on the thread:
microsoft.public.sqlserver.programming
Thursday, June 14, 2007 10:21 AM
Update help
http://tinyurl.com/27Z8PM
Given an update with join representing a 1 to many relationship:
UPDATE Orders
SET < >
FROM Orders
INNER JOIN
OrderDetails
ON Orders.order_nbr = OrderDetails.order_nbr;
Joe Celko argues:
'Bottom line: UPDATE FROM *can* be safely used - but only if you are
FOREVER certain that no single row in the target table can EVER e
joined to more than one row in the source table(s); FOREVER in the
ENTIRE LIFETIME of the application; FOREVER across all programmers yet
to come.'
And from Sql Server 2005 Bol:
'The results of an UPDATE statement are undefined if the statement
includes a FROM clause that is not specified in such a way that
only one value is available for each column occurrence that is updated,
that is if the UPDATE statement is not deterministic.'
What is conspiciously missing from the discussion and Bol is one of the
most fundamental concepts in database theory - a 'KEY' as opposed
to an index.
Obtaining the result of a natural join in the language of D4:
select Orders join OrderDetails ;
order_nbr some_col sku item_price
--------- -------- --- ----------
1 0 1 500
1 0 2 205
2 0 1 490
3 0 1 480
It should be obvious that the key of this table is {order_nbr,sku}.
The example of the update in question is:
UPDATE Orders
SET Orders.some_col = OrderDetails.item_price
FROM Orders
INNER JOIN
OrderDetails
ON Orders.order_nbr = OrderDetails.order_nbr;
which totally ignores the compound key and implicitly assumes a
1 to 1 relationalship between the tables. So instead of showing
a logical flaw in using a join in Update, it simply shows an
illogical update! The most logical way an update can use a
1 to N relationalship (ie a compound key) is to perform an update
on a single row! This is what the ridiculous 'undefined' in Bol
is hiding. An undefined update is the lack of recognition of the
appropriate key on the part of the user. The flip side to the coin
is the total lack of the concept of a key in Sql Server. Taken
together this combination is lethal.
In D4 an update that makes sense takes the form:
update (Orders join OrderDetails)
set { some_col :=item_price }
where row{order_nbr order_nbr,sku sku}=row{2 order_nbr,1 sku};
If there was any question of writing an update assuming the existence of
a key of just {order_nbr} we could test it by trying to select a row
based on just that key, for example select a row using order_nbr 1:
select
(Orders join OrderDetails adorn {key{order_nbr}})[1 by {order_nbr}];
D4 would give the message:
" Internal Index Error: 'Duplicate key violation.' ".
which should tell an application developer that any update using the
join only makes sense using the compound key. As in:
select
(SOrders join SOrderDetails)[1,2 by{order_nbr,sku}] ;
order_nbr some_col sku item_price
--------- -------- --- ----------
1 0 2 205
D4 is an application development language, sql is a programming language.
There is a big difference.
As for the sql standard in which Celko expresses the update:
UPDATE Orders
SET some_col
= (SELECT item_price
FROM OrderDetails
WHERE OrderDetails.order_nbr = Orders.order_nbr)
WHERE EXISTS
(SELECT *
FROM OrderDetails
WHERE OrderDetails.order_nbr = Orders.order_nbr);
It is the SET subquery which returns the error:
'Server: Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery
follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.'
And is nothing more than the Bol idea of an 'undefined' value. The idea
of the standard as a substitute for the concept of a key is, of course,
ridiculous. The answer to the question of 'which one' can be seen as
rather absurd in light of a 'key'. There should never be a question of
'which one' in a RDMS as there can only be but one! -:)
Interestingly, this issue is an example of searching for a question
that is appropriate and makes sense given a response. It is no easy
feat sometimes to make the connection -:)
bye for now,
steve
Comment:
Anonymous said...
This is nearly my first reading of anything to do with Dataphor, so excuse my ignorance.
But the assumption that runs strongly through your entire article is that two tables can
only be joined to each other in one and only one way. There are many application
contexts where this is an overly limiting assumption!
Sun Jun 17, 12:04:00 AM
Thanks for browsing the post.
To understand Dataphor you must realize that the intent of D4 is
different than sql. While it is certainly true that just about
any sql query can formulated (and many times more succintly) in
D4, the goal of each system is different. The goal of sql is
performance oriented retrieval of large numbers of rows. The goal
of Dataphor is toward application development which is less
concerned with retrieval and more concerned with 'addressing' tables.
Application development usually implies a small sample of rows
from table(s). The real goal is addressing and working with these
relatively small samples with a logical clarity not offered by
any sql system.
The difference between sql and D4 can seen thru the join construct.
In sql:
select A.OrderID,B.ProductID,A.CustomerID,B.Quantity
from Orders A join OrderDetails B
on A.OrderID=B.OrderID
The intent of the operation is retrieval and the only other information
(working with MS Sql Server 2005) the db is concerned with are the
index(s) of the associated tables. The same join operation in the context
of AD has a different intent. Here we would want to address the space
of the join in terms of its 'key'. In other words, to address the
projection of the join for a single OrderID and ProductID. For example,
create a view of join:
create view AllOrdersDetails
Orders join OrderDetails;
The key of the projection is {OrderID,ProductID} and within the application
the row of any combination can be easily obtained, for example Order 11077
and Product 77 (the [ ] in D4 is called an indexer for obvious reasons):
var T:=AllOrdersDetails[11077,77 by{OrderID,ProductID}]
over{OrderID,ProductID,CustomerID,Quantity}
with {IgnoreUnsupported = 'true'} ;
So we can easily work with any of the scalar values of the row:
select T.CustomerID;
select T.Quantity;
It is a small sample of rows or a single row that the application is
usually concerned with:
var Y:=row of typeof((Orders join OrderDetails)[]){ };
Y:=(Orders join OrderDetails)[11077,77 by{OrderID,ProductID}];
select Y over{OrderID,ProductID,CustomerID,Quantity};
The D4 join is in fact restricted to only 'equi-joins'. This is to stress
the importance of key inference. With only equi-joins the compiler has
an easier job of correctly inferring key(s) as does the developer.
For example expressing the join in terms of a 'where' restriction:
select ((Orders rename A) times (OrderDetails rename B)
where A.OrderID=B.OrderID)
('times' is like an sql cross join) the compiler infers the key:
{A.OrderID,B.OrderID,B.ProductID}
and an expression to obtain a particular Order and Product becomes:
select ((Orders rename A) times (OrderDetails rename B)
where A.OrderID=B.OrderID)[11077,11077,77 by{A.OrderID,B.OrderID,B.ProductID}];
Even though this is correct it lacks the 'clarity' of the join. Because
sql does not have the logical concept of a key nor the addressing of an
expression it can afford to lift any kind of restriction(s) for a join.
This is why the sql on clause accepts non equi-joins. Sql, in fact, will
take any argument(s) for on ranging from the sensible to the non-sensible.
It is only concerned with the physical index for retrieval and there
are no logical restrictions on the result (for example D4 will not return
nor address a table with duplicate key(s) or rows). Since AD translates to
expressions of integrity ie. keys, constraints and especially constraints
involving tables (references) the equality operator is the primary logical
operator of D4 and AD.
I should also note that traditional uses of non-equi joins and the
introduction of predicates in sql joins can be completely eliminated in AD.
For example the use of a self-join to obtain a rank using the Count
aggregate:
select A.OrderID,A.ProductID,A.Quantity,Count(*) as Rank
from OrderDetails A join OrderDetails B
on A.OrderID=B.OrderID and A.Quantity>=B.Quantity
group by A.OrderID,A.ProductID,A.Quantity
order by A.OrderID,Rank desc
OrderID ProductID Quantity Rank
----------- ----------- -------- -----------
10248 11 12 3
10248 42 10 2
10248 72 5 1
10249 51 40 2
10249 14 9 1
If I want information on any quantity for a particular order
I can directly address the row(s) without the need to 'retrieve'
any information:
var Quantity:=
ToList
(
cursor
(
OrderDetails where OrderID=10248
order by {Quantity desc}
)
);
var Cnt:=Quantity.Count();
select Cnt;
select Quantity[0]; //Max Quantity
select Quantity[Cnt-1];//Min Quantity
3
OrderID ProductID UnitPrice Quantity Discount
------- --------- --------- -------- --------
10248 11 $14.00 12 0
OrderID ProductID UnitPrice Quantity Discount
------- --------- --------- -------- --------
10248 72 $34.80 5 0
Drop by anytime and I hope you explore Dataphor.
best,
steve
3 comments:
This is nearly my first reading of anything to do with Dataphor, so excuse my ignorance. But the assumption that runs strongly through your entire article is that two tables can only be joined to each other in one and only one way. There are many application contexts where this is an overly limiting assumption!
Hello,
Thanks for your comment.I have addressed your issue at the end of the post (Update..From)
thanks,
steve
If you can't find the response to the above comment by Anonymous
on Update..From click here.
steve
Post a Comment