Here are some questions from an sql developer I'm trying to answer. I'm trying to explain that a relational database realizes a stored procedure in a fundamentally different way than sql. Along the way I'm talking about relational ideas as they relate to tables and types. The conversation shows the vise-like grip sql seems to have on the minds of man ☻ Microsofts functional programming offering LINQ also comes up. Overall I think I did a good job trying to explain the big differences between sql and a relational database. There's some simple but good examples that drive home the significant differences. Black is relational(me) Purple is sql Here we go:
> My objection is not so much to your general idea of variables of > type table-with-given-columns (I've recently worked with some systems > that could be cleaner if such a thing were available; currently they > work around it using temp tables); more to your specific use of D4 in > all your examples, as opposed to a pseudo-code extension of SQL. Ok, let me directly address you dislike of D4 and your preference for a pseudo-code extension of SQL. I'll refer to your pseudo-syntax in the thread: comp.databases.ms-sqlserver 'Basic Anatomy of Sql Server' http://tinyurl.com/2olako > Then you might want to write examples in a pseudo-syntax that /looks/ like SQL. I know this is a matter of taste, but your examples look ugly to me. Consider: -- Your example of a stored procedure that returns a result set, the -- format of which can only be deduced by reading through the code. CREATE PROCEDURE dbo.GroupByShipCountry @Employee Integer AS SELECT ShipCountry,Count(*) Cnt,Min(Freight) MinFrt,Max(Freight) MaxFrt FROM Orders WHERE EmployeeID=@Employee GROUP BY ShipCountry -- Your example of the same stored procedure rewritten in D4. create operator GroupByShipCountry (Employee:Integer): table{ShipCountry:String,Cnt:Integer,MinFrt:Money,MaxFrt:Money} begin result:= Orders where EmployeeID=Employee group by {ShipCountry} add{Count() Cnt,Min(Freight) MinFrt,Max(Freight) MaxFrt} ; end; -- My example of the same stored procedure rewritten in a -- pseudo-extension of T-SQL. CREATE PROCEDURE dbo.GroupByShipCountry @Employee Integer, @ResultSet Table ( ShipCountry varchar(15), Cnt int, MinFrt money, MaxFrt money ) output AS SELECT ShipCountry, Count(*) Cnt, Min(Freight) MinFrt, Max(Freight) MaxFrt INTO @ResultSet FROM Orders WHERE EmployeeID=@Employee GROUP BY ShipCountry > To begin with, the idea of a stored procedure returning a 'result' is an sql concept. This concept does not exist in a relational (D4) system. Relationally, a stored procedure only exists when it is created. The execution of a sp, its runtime realization, does not involve the definition of the procedure nor the idea of 'returning' something from it. Relationally at runtime what sql see's as a procedure and a result 'is' a variable of the type of the result. This is the huge difference between the two systems. Relationally the '@ResultSet' and the idea of inserting a query result into it is contradictory and meaningless. The 'name' of the procedure 'is' the variable (table), there is no result from a sp (ie. sql). Syntactically an sql tabled value function is closer in spirit to the D4 procedure with the big difference that the name of the table valued function is 'not' a typed variable like in D4. Finally, the sql sp makes the distinction between identifiers as variables and non-variables using the '@'. In D4 there is no such distinction as 'all' identitifiers are by definition variables and the '@' is superfluous. The 'output' declaration in the sql sp is based on the general sql idea of 'returning' something. Such a declaration is superfluous relationally as, again. there is no concept of 'returning a something' from a 'this sp'. Note that LINQ realizes an sql stored procedure exactly as sql intends it and nothing like the relational D4. The 'functional' part of integrated query is simply how the sp is accessed within a net language. There is no concept of a typed variable with the name of the sp. In other words, the OR map is mapping to the same sql as if the mapping didn't exist. MS has added a 'functional language' within net when what it should have done is added a functional language to the database itself! ☺ D4, after all, represents the functional language of a relational system and how easy it is to use such a language within a present day (net) programming language. The D4 answer to overcoming the object-relational mismatch is of a totally different nature to the one offered thru LINQ. My objection to LINQ lies in the idea that no one at MS seems to have considered an alternative. Let me try to cut thru the semantics with a few examples that I hope will clarify some of the points I'm trying to make. In t-sql this should be perfectly clear: DECLARE @X INT SET @X=5 The variable @X can only take one value at any specific time. In a relational system a procedure that returns some value at runtime must behave exactly like @X. At runtime the procedure is a variable of a particular type and has a specific value based on input arguments. An sql sp has no such nature and behaves in an entirely different way. create procedure SqlOne @Y int AS SELECT COUNT(*) AS CNT,SUM(FREIGHT) AS SUMFRT FROM ORDERS WHERE EMPLOYEEID=@Y SELECT * FROM ORDERS WHERE EMPLOYEEID=@Y Therefore the idea that an sql procedure can return multiple results is meaningless if the sp is realized as a variable where only a single result makes sense. Add to this the idea of type where each result is a different type and the difference between sql and relational should be even clearer. Again the relational procedure is realized exactly like the int @X. No programming language chooses among possibe multiple definitions of the value of a variable. It would be equivalent to: DECLARE @X INT SET @X=5 or @X=10 which makes no sense. It is because the sql sp is not realized as a variable that multiple results 'can' be returned. This sql sp: CREATE PROCEDURE SqlTwo @A INT OUTPUT, @B INT OUTPUT AS SET @A=5 SET @B=10 DECLARE @C INT,@D INT EXEC SqlTwo @C OUTPUT,@D OUTPUT SELECT @C SELECT @D makes no sense relationally because, again, there are multiple results. Now there are two scalar types (int) returned instead of sql 'resultsets'. Relationally there is no such thing as more than 1 thing (think a variable of a type) at a time. Two scalar results are realized as a 'row' type relationally, ie. 'one' thing. create operator D4Two():row(A:Integer,B:Integer) begin result:=row(5 A,10 B); end; In this case at runtime D4Two is a variable of type row with 2 scalar columns. From the relational perspective a table/row/list is a variable that behaves exactly like a variable in a programming language. Its value can be assigned to other values just like a t-sql integer variable can. It can be compared to other variables (for equality) just like a t-sql integer variable. It can be passed as an argument to a procedure just like a t-sql integer variable. For these reasons why MS decided to call something a 'table variable' remains a mystery. It behaves like no other variable in any programming language on the face of the planet ☺ If it doesn't quack like a duck, doesn't behave like a duck, doesn't waddle like a duck it sure as hell isn't a duck ☺ What MS calls a table variable is surely not a table variable as the idea exists in any programming language or relationally. Whatever one wants to call an sql table the table variable is the same thing. Its phyiscally implementation may be different but that does not change the fact it is not a variable of a specific table type. Sql distinguishes between user defined functions and procedures. But sql user defined functions are on the same exact level of procedures when looked at from the point of view of 'variables'. Neither one has anything to do with the idea of a relational variable. All this artificial distinction does is serve to make it harder for users to understand the relational model ☺ (Why sql choose to create a user define function/procedure dichotomy is another topic. But think of where and having). Rather than dwell on particular syntax or pseudo-syntax I think it is the ideas that the relational model is based on that is important. And what we're talking about here is just a slice of the relational model. The relational model is not rocket science ☺ It's actually quiet straightforward. Ironically it's sql that is out in left field. The relational model is in line with all current programming languages. Unfortunately that's never been the case with sql ☺ This is one of the reasons I find LINQ so unnecessary. Once you get the idea that a big part of the relational model is all about the basic concepts of variables and types I think (I at least hope) that what I've been trying to explain will make perfect sense ☺ >Add to this the idea of type where each result is a different type >In other words, TABLE (X INT, Y INT) is a different variable type >from TABLE (M VARCHAR(15), N VARCHAR(15))? Exactly. Think of sql strings. This table, TABLE(TABLE (M VARCHAR(15), N VARCHAR(15)), is a differnt type than TABLE (N VARCHAR(16), N VARCHAR(16))! This means that we couldn't compare the two and undermines real relational division. To declare how many characters in a string is clearly the opposite of what the relational idea of data independence is all about. Relationally there can only be a 'string' type having absolutely nothing to do with its storage characteristics. And this is the same idea in any programming language. This is just one manifestation of how sqls design ignores the concept of a strong type. LINQ is an attempt to hide the fact that: DECLARE @N VARCHAR(5),@M VARCHAR(6),@P VARCHAR(5),@Q VARCHAR(6) represents 4 different types. This is but one simple form of the idea that sql guarantees impedence mismatch! And having the choice of changing the database or the access to it, MS chose access to the database (LINQ). The sql community seems to not consider how bizarre and confounding things like this look to developers coming to sql for the first time. It must be force of habit blinding a more critical look at how things are. There are so many that sql defeats the idea of strong types that it would be better and easier to build a new database system for application development. And have sql available for everything else ☺ >What if the data you want to return is not multiple scalars, but >rather multiple tables? Upon reflection, I suppose tables could >be nested in this model, i.e. you can return > TABLE(T1 TABLE(X INT, Y INT), T2 TABLE(M VARCHAR(15), N VARCHAR(15))) The relational database emphasizes types to define structures that can be used to model processes, ie. create table T1 { A:Integer, B:String, T:row{X:Integer,Y:String}, S:row{B:Integer,C:list(String)} key{A} }; This is possible thru system provided types and user defined types. It's also supported thru explicit conversion processes between one particular type and another. The foundation to support these constructs is unique to a relational system and does not exist in sql. Whether a specific struture makes logical sense and whether it's supported is another matter. Tables within tables is open to question. Even if the system supports it would it make sense or would another type of structure be more appropriate? This question goes to the edges of a relational system and I'm afraid I can't do it justice here ☺ >This would allow bad developers to commit the common 'a,b,c' 1NF >violation in a whole new way, but then bad developers can screw >things up in any language. The view that strings like 'a,b,c' violate the idea of the atomicity of a column in an sql table is a direct result of sql's lack of types and lack of relationships between types. There is no violation of any kind in a relational system because the string can be stored as a single value of a column retaining the concept that there individual elements involved. It would simply be stored as a 'list' type. For example column B of table TableList takes a comma delimited string and splits it into a list type. create table TableList from table { row{1 A,'A,B,C'.Split({','}) B}, row{2,'D, E , F, G'.Split({','})}, row{3,'H,I,J,K'.Split({','})} }; The table definition of TableList is: create table TableList A:Integer, B:list(String), key{A} For each row of the table column A can be retrieved and the individual items of column B, the list, are availiable. Select the value of A and the value of the 1st item in the list(B). select TableList {A,B[0] ListItem1}; A ListItem1 - --------- 1 A 2 D 3 H Directly address the 1st item in the list of A=2. select TableList[2].B[0]; D Directly address the last item in the list of A=3. select TableList[3].B[TableList[3].B.Count()-1]; K Get column B for the row where A=1 and convert the list into a table. select ToTable(TableList[1].B,'Str','Index'); or select ToTable( (TableList where A=1 over{B})[].B, 'Str','Index'); Str Index --- ----- A 0 B 1 C 2 It's types that a relational system guarantees integrity for and high level operators that allow the explicit conversions between that developers should have for application development. And this is the same idea the MS net team calls 'functional programming' which is what they developed LINQ for. But a relational system is functional programming! ☺ >bad developers can screw things up in any language. Sure but application development with sql has a tendency to make anyone a nitwit at some time or other. I'm for less nitwits ☺ >> 'where' and 'having'). >I'd guess these are both for efficiency. GROUP BY was added after the original SELECT but instead of redesigning the language they saw no problem with leaving in two constructs that do the same thing! Every time I see an MS paper on 'best practices' I have got to grin ☺ On table variables: >But you agree that (1) it has some features of variables, and (2) it >could reasonably be extended to have more features of variables? I don't think MS could lock its developer army in a hotel and tell them to make sql a little more relational ☺ They have two choices. Either buy a relational system (like D4) or start from the ground up to develop one. The gulf between a relational system and sql is too great to try to simply make changes in sql server. >A lot of people find SQL pretty straightforward, especially in this >newsgroup. Your choice of (pseudo-)syntax will make a difference to >them. (You might get different responses from a newsgroup focusing >on front-end programming languages, especially if they already >resemble Pascal as D4 seems to do.) Sure at least a dozen people who write books and articles find sql straightforward ☺ For the rest I'd like to at least see a choice for them. Again, I hope some can get beyond just syntax to grasp what a relational system would offer.