What is really wrong with the "UPDATE.. FROM.." syntax? This example uses MS Sql Server 2005 and the D4 language of Dataphor @ www.alphora.com The discussion is based on the thread: microsoft.public.sqlserver.programming Thursday, June 14, 2007 10:21 AM Update help http://tinyurl.com/27Z8PM Given an update with join representing a 1 to many relationship: UPDATE Orders SET < > FROM Orders INNER JOIN OrderDetails ON Orders.order_nbr = OrderDetails.order_nbr; Joe Celko argues: 'Bottom line: UPDATE FROM *can* be safely used - but only if you are FOREVER certain that no single row in the target table can EVER e joined to more than one row in the source table(s); FOREVER in the ENTIRE LIFETIME of the application; FOREVER across all programmers yet to come.' And from Sql Server 2005 Bol: 'The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic.' What is conspiciously missing from the discussion and Bol is one of the most fundamental concepts in database theory - a 'KEY' as opposed to an index. Obtaining the result of a natural join in the language of D4: select Orders join OrderDetails ; order_nbr some_col sku item_price --------- -------- --- ---------- 1 0 1 500 1 0 2 205 2 0 1 490 3 0 1 480 It should be obvious that the key of this table is {order_nbr,sku}. The example of the update in question is: UPDATE Orders SET Orders.some_col = OrderDetails.item_price FROM Orders INNER JOIN OrderDetails ON Orders.order_nbr = OrderDetails.order_nbr; which totally ignores the compound key and implicitly assumes a 1 to 1 relationalship between the tables. So instead of showing a logical flaw in using a join in Update, it simply shows an illogical update! The most logical way an update can use a 1 to N relationalship (ie a compound key) is to perform an update on a single row! This is what the ridiculous 'undefined' in Bol is hiding. An undefined update is the lack of recognition of the appropriate key on the part of the user. The flip side to the coin is the total lack of the concept of a key in Sql Server. Taken together this combination is lethal. In D4 an update that makes sense takes the form: update (Orders join OrderDetails) set { some_col :=item_price } where row{order_nbr order_nbr,sku sku}=row{2 order_nbr,1 sku}; If there was any question of writing an update assuming the existence of a key of just {order_nbr} we could test it by trying to select a row based on just that key, for example select a row using order_nbr 1: select (Orders join OrderDetails adorn {key{order_nbr}})[1 by {order_nbr}]; D4 would give the message: " Internal Index Error: 'Duplicate key violation.' ". which should tell an application developer that any update using the join only makes sense using the compound key. As in: select (SOrders join SOrderDetails)[1,2 by{order_nbr,sku}] ; order_nbr some_col sku item_price --------- -------- --- ---------- 1 0 2 205 D4 is an application development language, sql is a programming language. There is a big difference. As for the sql standard in which Celko expresses the update: UPDATE Orders SET some_col = (SELECT item_price FROM OrderDetails WHERE OrderDetails.order_nbr = Orders.order_nbr) WHERE EXISTS (SELECT * FROM OrderDetails WHERE OrderDetails.order_nbr = Orders.order_nbr); It is the SET subquery which returns the error: 'Server: Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression. The statement has been terminated.' And is nothing more than the Bol idea of an 'undefined' value. The idea of the standard as a substitute for the concept of a key is, of course, ridiculous. The answer to the question of 'which one' can be seen as rather absurd in light of a 'key'. There should never be a question of 'which one' in a RDMS as there can only be but one! -:) Interestingly, this issue is an example of searching for a question that is appropriate and makes sense given a response. It is no easy feat sometimes to make the connection -:) bye for now, steve
Comment: Anonymous said... This is nearly my first reading of anything to do with Dataphor, so excuse my ignorance. But the assumption that runs strongly through your entire article is that two tables can only be joined to each other in one and only one way. There are many application contexts where this is an overly limiting assumption! Sun Jun 17, 12:04:00 AM Thanks for browsing the post. To understand Dataphor you must realize that the intent of D4 is different than sql. While it is certainly true that just about any sql query can formulated (and many times more succintly) in D4, the goal of each system is different. The goal of sql is performance oriented retrieval of large numbers of rows. The goal of Dataphor is toward application development which is less concerned with retrieval and more concerned with 'addressing' tables. Application development usually implies a small sample of rows from table(s). The real goal is addressing and working with these relatively small samples with a logical clarity not offered by any sql system. The difference between sql and D4 can seen thru the join construct. In sql: select A.OrderID,B.ProductID,A.CustomerID,B.Quantity from Orders A join OrderDetails B on A.OrderID=B.OrderID The intent of the operation is retrieval and the only other information (working with MS Sql Server 2005) the db is concerned with are the index(s) of the associated tables. The same join operation in the context of AD has a different intent. Here we would want to address the space of the join in terms of its 'key'. In other words, to address the projection of the join for a single OrderID and ProductID. For example, create a view of join: create view AllOrdersDetails Orders join OrderDetails; The key of the projection is {OrderID,ProductID} and within the application the row of any combination can be easily obtained, for example Order 11077 and Product 77 (the [ ] in D4 is called an indexer for obvious reasons): var T:=AllOrdersDetails[11077,77 by{OrderID,ProductID}] over{OrderID,ProductID,CustomerID,Quantity} with {IgnoreUnsupported = 'true'} ; So we can easily work with any of the scalar values of the row: select T.CustomerID; select T.Quantity; It is a small sample of rows or a single row that the application is usually concerned with: var Y:=row of typeof((Orders join OrderDetails)[]){ }; Y:=(Orders join OrderDetails)[11077,77 by{OrderID,ProductID}]; select Y over{OrderID,ProductID,CustomerID,Quantity}; The D4 join is in fact restricted to only 'equi-joins'. This is to stress the importance of key inference. With only equi-joins the compiler has an easier job of correctly inferring key(s) as does the developer. For example expressing the join in terms of a 'where' restriction: select ((Orders rename A) times (OrderDetails rename B) where A.OrderID=B.OrderID) ('times' is like an sql cross join) the compiler infers the key: {A.OrderID,B.OrderID,B.ProductID} and an expression to obtain a particular Order and Product becomes: select ((Orders rename A) times (OrderDetails rename B) where A.OrderID=B.OrderID)[11077,11077,77 by{A.OrderID,B.OrderID,B.ProductID}]; Even though this is correct it lacks the 'clarity' of the join. Because sql does not have the logical concept of a key nor the addressing of an expression it can afford to lift any kind of restriction(s) for a join. This is why the sql on clause accepts non equi-joins. Sql, in fact, will take any argument(s) for on ranging from the sensible to the non-sensible. It is only concerned with the physical index for retrieval and there are no logical restrictions on the result (for example D4 will not return nor address a table with duplicate key(s) or rows). Since AD translates to expressions of integrity ie. keys, constraints and especially constraints involving tables (references) the equality operator is the primary logical operator of D4 and AD. I should also note that traditional uses of non-equi joins and the introduction of predicates in sql joins can be completely eliminated in AD. For example the use of a self-join to obtain a rank using the Count aggregate: select A.OrderID,A.ProductID,A.Quantity,Count(*) as Rank from OrderDetails A join OrderDetails B on A.OrderID=B.OrderID and A.Quantity>=B.Quantity group by A.OrderID,A.ProductID,A.Quantity order by A.OrderID,Rank desc OrderID ProductID Quantity Rank ----------- ----------- -------- ----------- 10248 11 12 3 10248 42 10 2 10248 72 5 1 10249 51 40 2 10249 14 9 1 If I want information on any quantity for a particular order I can directly address the row(s) without the need to 'retrieve' any information: var Quantity:= ToList ( cursor ( OrderDetails where OrderID=10248 order by {Quantity desc} ) ); var Cnt:=Quantity.Count(); select Cnt; select Quantity[0]; //Max Quantity select Quantity[Cnt-1];//Min Quantity 3 OrderID ProductID UnitPrice Quantity Discount ------- --------- --------- -------- -------- 10248 11 $14.00 12 0 OrderID ProductID UnitPrice Quantity Discount ------- --------- --------- -------- -------- 10248 72 $34.80 5 0 Drop by anytime and I hope you explore Dataphor. best, steve
3 comments:
This is nearly my first reading of anything to do with Dataphor, so excuse my ignorance. But the assumption that runs strongly through your entire article is that two tables can only be joined to each other in one and only one way. There are many application contexts where this is an overly limiting assumption!
Hello,
Thanks for your comment.I have addressed your issue at the end of the post (Update..From)
thanks,
steve
If you can't find the response to the above comment by Anonymous
on Update..From click here.
steve
Post a Comment