Beyond Sql: Dataphor - Reply to Michel

Hello Michel,

It's been a while:)
I do appreciate you taking time to explore this subject!
I hope talented and intellectually mature people like yourself
take a serious interest in this matter.

> So, keep your shields up, :-)

You have my word Kirk :)

>> I decided to take a couple of days in order to get some 'emotional distance
>> toward what you wrote, in your blog, and while I did just that, but not
>> having time to read every thing there, I am still left with an incomplete
>> picture.

Let me give you short overview of the entire subject.
A small group of very talented people, both conceptually and technically,
at Alphora implemented the relational model known as Tutorial D completely
specified in the book:
Databases, Types, and The Relational Model: The Third Manifesto, 3rd edition
by C. J. Date and Hugh Darwenhttp://www.thethirdmanifesto.com/

This relational (non-procedural database access) language is within a Pascal
(declarative/procedural) like language. The language in toto is referred to as
'D4' by alphora. The relational language is conceptually (radically) different
from sql in two fundamental ways. It rests on the concept of 'type' and
'variable'. Whereas sql is absent the concept of 'type' and 'variable'
as it relates to a table. In the absence of type there is only a 'file name',
much like there is only the value of integer 1 in the absence of the
ability to assign integer 1 to a variable. The design of the non-procedural
language is also radically different from sql. There are so many differences
with sql (conceptually, design of language, keys, use of meta-data, constraints,
views, duplicates,procedures etc) that it not possible for me to go into them
all here :) Suffice it to say we are talking qualitive differences not quantitive.

The 'relational' idea of the database is itself in the service of the
overall goal of the whole system. And that is application development.
But a very special method of AD, from procedural to declarative. In other
words, from the traditional procedural method(s) of AD to a methodology
based on 'inference'. It is as if 'performance' as the goal of an sql
optimizer with its emphasis on the physical implementation of sql was
now secondary to the 'logical' model with the emphasis on the compiler
as a logical inference mechanism. In D4 there are no query 'hints' there
are logical 'hints', index hints are replaced with 'key' hints. The user
works with the compiler on a logical level (clarifying which index to
use is replaced with what key(s),reference(s) does a resultant table have).
(As a quick aside, all joins can only be equi-joins. This insures an
unambigious key(s) for the resulting table. Note that the concept
of requiring every result/table (regardless of how it is derived) to
have a key(s) is entirely absent from sql. And yes other predicates can
be used to form tables with other relational operators (see D4 'times' operator).

The ui in Dataphor is capable of making very detailed windows into
the database exclusively based on inference (views which go way beyond
sql, references and meta-data are key components on which inferences are
based).Of course you can modify the gui to your hearts content:) I'm
hoping that the integration between the ui and tables in Access
may lead some to explore the sophisticated inference ideas in Dataphor.
I illustrate these ideas with a mini-application that can be downloaded:
http://beyondsql.blogspot.com/2007/07/dataphor-example-of-rapid-application.html


>>Your points are not about J. C., so even if you want to notify the reader
>>that you take his objections into consideration, you surely can and have to
>>mention it, but not as introduction to the point you develop. I refer to the
>>article where you ... try... to introduce the fact that a table can be a
>>parameter and you start the discussion by bringing one of the poorest answer
>>Joe Celko may have ever done in his entire life, of a mathematician (I say
>>that but take in account that I have a lot of respect for that guy, for the
>>good stuff he produced).

I respect Joe too. I hope you will forgive me if I take some liberties:)
Joe implied the logical implausibility of the idea of a 'super function'
in sql. Of course he is correct, in sql. I used that same idea to show
how logically it fits in a relational database. I used his thought
to 'contrast' sql and D4 on the basis of tables as variables. I think/hope it
worked out:) As a practical matter it is not easy to communicate
many of these ideas. Joe is a known and respected authority. Perhaps the
reader will stick around and see how things worked out:) Also bear in mind
these concepts read in a less than compelling way to most. It is hard to make
them register. That is why I try to show an actual example that derives
from concepts that must be understood for the user to make sense of the
material. I think Joe understands. I have had many exchanges with him:)
For example see:
 (
  microsoft.public.sqlserver.programming
  Tuesday, August 21, 2007
  'Primary key selection'
   http://tinyurl.com/39dgcp
(My point here is sql is immature based on its concept of a key. The key
 is elevated in D4 to where index is in sql. Note that 'all' types including
 tables have a logical 'addressing' mechanism. For tables it is the key(s).
 This concept and its usage is completely alien to sql users).
 I've also used ideas expressed by Itzik Ben-Gan to contrast sql and D4.
 See:
 http://beyondsql.blogspot.com/2007/06/sql-using-dense-rank-for-identifying.html
 So I'm at least in fast company, yourself included :-)
 )

I use MS sql server in much the same way. Some concepts make sense
in sql (some don't:) but take on a whole new meaning in D4. Again I use
the contrast to make my point. In this vein I've used dynamic sql and
'lists' in sql server to name a few. See:
'Sql - History repeats itself simulating lists'
http://beyondsql.blogspot.com/2007/07/sql-history-repeats-itself-simulating.html

>>In my opinion, I must say that for optimization of a query plan, I still find
>>that knowing the table, its physical structure, its indexes, its stats, what
>>is required (what is SELECTed), etc. can make the optimizer find a better plan
>>than when having a ... general ...  table. Not knowing if you SELECT DISTINCT
>>a primary key or a secondary field, as example, can make some difference in
>>the lag the end user is likely to experience, at runtime. Sure, there are
>>cases where the optimized plan will be very fast to obtain (so why not
>>getting it at runtime), or where it is irrelevant, but I cannot say that
>>having a table, as parameter, is on my priority list at all.

The general issue of 'performance' will forever rear its ugly head:(:)
At this point I have little incentive to get into the details of
optimizing D4 code regardless if its queries, non-procedural code
or whatever. The basic concepts are hard enough to get across. I fear
introducing details of how certain D4 constructs will gain a performance
advantage is premature. I would ask that you look at this article for
some perspective on the notion of performance in AD/D4:
http://beyondsql.blogspot.com/2007/06/dataphor-inclusive-sql-vs-exclusive-d4.html

Eventually I will address performance head on.

Let me introduce some additional info. At this point in its development
Dataphor does not possess its own mature 'native' storage (for development
yes, for enterprise work no). Dataphor utilitizes a 'device' for
storage. A device can be one of a number of enterprise sql dbs, ie.
DB2, Oracle, SAS, Sql Server. I use Sql Server as a device. So the device
can simply be the data repository. At the most elementary level all
DDL commands in the synatax of D4 will throw the data into an sql
server database. All DML commands in D4 will, when appropriate,
access data in sql server going thru a client language interface.
The D4 compiler will determine the details of the interaction with
the device. This process is by no means trivial but at the very
least can be transparent to the user. But channels of communication
are availiable for user intervention. A most common one is what
most Access users are familiar with, the pass thru query. So the
user can throw any valid t-sql (queries, procedure calls, ddl) to
the server. For t-sql queries and server side procedures that produce
a result (table) the result will be treated just as if it was derived
via native D4 syntax. Of course there is nothing preventing a user from
using dynamic sql. So we come around in a full circle. If one becomes to
stressed in the relational world of D4 there is always the option to
to go back to the world that time forgot! :-) :-)

>> Your writing style is (still) complex. I know mine is, also, but in this
>> context, my style is not relevant, I think.

You have me at a distint disadvantage. But bear in mind sql has been
around for almost 40 years and talking about it succintly still remains
a burden few can carry. Communicating effectively in the newborn relational
terms, ie types, variables and relvars, is a work in progress. I will
have to take my lumps. But I will try to make effective use of feedback
which I wholeheartedly encourage.

>>I haven't see a 'tour guide'. I am not even sure I started at the right
>>place! Sure, if the purpose is still to start a discussion, that can be
>>expected since we don't know where it will go, but there is already a lot of
>>material (not necessary self contained, I am afraid) and I am still not sure
>>about 'where' in the process of making an application you want to focus. You
>>are not spreading your thought about the user interface, aren't you? Ok, it
>>is NOT SQL, it is maybe Access, but WHAT is it? I am a little bit lost, and
>>have the impression  that I could put my hands on great (I hope) pictures of
>>a movie, but still not sure if the movie is a documentary, or something
>>else.

I understand what you are expressing. Start at the beginning, but just
where is the beginning?
To get a jump start, to actually see what D4 scripts are and the
steps that constitute a working example/application see:
http://beyondsql.blogspot.com/2007/07/dataphor-example-of-rapid-application.html
This is a fully functioning application that includes a ui. But bear in
mind one does not need the 'ui' to explore the system. One can learn
quite a bit about modelling with sql server without the need for a ui.
Creating a single D4 operator/procedure or query will start a user on his way.
Once you start up Dataphor you can tinker with creating tables, queries
or whatever.

I would recommend the serious user:

Go to www.alphora.com and download Dataphor. Install the beast and start
pouring over the documentation to get a feel for this thing (the documentation
is also availiable online). If you have any version of sql server
you can quite easily use it as your data store. You can connect to
a server database and have access to all tables in the database immediately.
For example connect to the Northwind or Pubs database. (If you need a hand
connecting contact me via my blog). Start getting familiar with the
Dataphoria gui. Your going to be doing your development with this ui.
It's straightforward just like query analyzer is. You can easily create
new tables in D4 that will reside on sql server. Start browsing the
relational operators in help. Try a few queries. Your on your way.

My blog
Aside from the mini-app example, the articles consist of pep talks,
explanations and examples (along with their underlying concepts).
The examples can all be copied and pasted into Dataphor and run.
They work:) Think of the examples as snapshots. The show different
slices of constructs and concepts (Dataphor comes with many samples too.
Be sure to check them when you install Dataphor). Joe Celko has said that
at some point in time when learning sql a light in your head will hopefully
appear. Same thing in D4. At some point it will all come together
if you bear with it.


>>It seems you mention an interest for the framework (dot.Net) which is an
>>anti-SQL thing (even if they added some element of SQL in version 3.0),
>>imho. Gone the sub-queries, even gone the joins, just a simple
>>select-from-where and if you need 'join' or 'subqueries', you use the
>>framework objects that refer to these 'simple' statements, and use C#, or
>>whatever, not SQL, to carry the 'joins', but without the memory explosion
>>what SQL-joins do. On the other hand, from the first impression I got, it
>>seems you add complexity to the SQL language. So, it would be like rooting
>>in the wrong soil, no? Since the framework is about simple, simple SQL
>>statements, that is (and C# developers are ready to pay the price by writing
>>complex C# procedural statements, because they know C#, but not SQL, and
>>don't want to invest in a more complex SQL like language). I may have the
>>wrong idea of what you saw, though.

Dataphor is built with C# a typed language. Conceptually it makes a lot of
sense since D4 is a itself a typed system. The concept of type is one of the
foundations of the 'relational' model. In other words it is logical that
D4, with its reliance on type, be built with a framework that is typed.
The fact that D4 is built with a net language in no way implies that
D4 has anything to do with the LINQ/DLINQ project which is what you seem to
be referring to. The idea the MS can 'hide' sql from net developers.
What ever games MS is playing with sql from net has nothing to do with
D4. But it also means that just like channels of communication from D4 to
the storage(device) ie. sql server, there are channels from D4 to net.
So users can write assemblies and interact with the net framework in other
ways. And like pass-thru queries, this communication with net does not
come with any hidden price to be paid:) It can be used to expand the
functionality of D4, for example user defined types defined with a
net language (note that user defined types can be defined quite nicely
from within D4). Aside from channels of communication, the net framework
is 'transparent' to the user. Again MS's communication from net to
sql server is totally independent of D4 and its relational language.

Some parting thoughts.
I have not said anything specific about the relational (query) language.
As an expert sql developers you will find it quite different which
is to be expected. You will also find that some query concepts in
sql will carry over. You will find there are many new constructs,
you will find many sql constructs greatly expanded in functionality
(constraints, views, procedures to name just a few) and you
will discover new ways of doing things in comparison to sql. Most
of the historical objections to sql (both conceptually and design of the
language) are corrected in D4.
Mastery of D4 will take an effort, this is not a trivial exercise.
But I think you will find it is a great picture:)
If you have any questions or just want to chat please feel free to
contact me (thru the blog is fine). People like yourself will only
make this system better.
Lastly, as I state in several articles, D4 does not necessary
eclipse sql. Sql has its place but it is misplaced as a foundation
for application development. For that D4 is best suited.

best,
steve
Wednesday, August 29, 2007

Dataphor - Reply to Michel

No comments:

Beyond Sql

Blog Archive

Links

About Me