Beyond Sql: Basic Anatomy of Sql Server, part I

This is the first in a series of articles examining some basic
concepts in Sql Server.

What is a stored procedure?
                  
This example uses Sql Server 2005 and the Northwind sample database.

The following is a very simple stored procedure that returns some
aggregates from the Orders table for a given EmployeeID value supplied
as an input parameter:
                  
CREATE PROCEDURE dbo.GroupByShipCountry
      @Employee Integer
AS
SELECT ShipCountry,Count(*) Cnt,Min(Freight) MinFrt,Max(Freight) MaxFrt
FROM Orders
WHERE EmployeeID=@Employee
GROUP BY ShipCountry

After executing the above script to create the procedure we ask a most
simple question: what is it? What exactly is Sql Server aware of
regarding this procedure? We can take as the starting point the definition
of a generic procedure. Bol states:

'A stored procedure is a saved collection of Transact-SQL statements ...
that can take and return user-supplied parameters.'

We can simply ask is there any indication that the server is aware of
anything that rises above the level of the t-sql statements defining
the procedure. Should we expect to find such thing? We do know that
the server recognizes the object Orders in sys.objects as a USER_TABLE.
And if we execute the query:

SELECT *
FROM Northwind.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='Orders'

we know the server recognizes the columns and their datatypes for the
Orders table. We know the server recognizes that the procedure depends
on three columns from the Orders table:

SELECT object_name(referenced_major_id) Dependent,COLUMN_NAME
FROM sys.sql_dependencies A
CROSS APPLY (SELECT B.COLUMN_NAME
            FROM Northwind.INFORMATION_SCHEMA.COLUMNS B
            WHERE B.TABLE_NAME=object_name(A.referenced_major_id)
            and A.referenced_minor_id=B.ORDINAL_POSITION) C
WHERE object_id=object_id('dbo.GroupByShipCountry')

Dependent COLUMN_NAME
--------- ---------------
Orders    EmployeeID
Orders    Freight
Orders    ShipCountry

We know the procedure works if we execute it:

Execute dbo.GroupByShipCountry 8       

It retrieves, in the precise wording of bol, 'rows and columns from the
Orders table' which are collectively termed a 'result set'. (We will
return to the server Orders table and result set term in later articles).

ShipCountry     Cnt         MinFrt                MaxFrt               
--------------- ----------- --------------------- ---------------------
Argentina       3           .3300                 217.8600
Austria         5           25.2200               353.0700
Brazil          9           5.3200                110.8700
.
Venezuela       9           4.4100                141.0600

When we look for specific information about the procedure itself with
the sp_help system procedure we see that @Employee is listed as a
parameter.

The information from sys.objects confirms that the procedure is
recognized and it indeed is a definition of a stored procedure:

SELECT name,object_id,type,type_desc
FROM sys.objects
WHERE object_id = OBJECT_ID('dbo.GroupByShipCountry') AND type='P'

name                 object_id   type type_desc             
-------------------- ----------- ---- ----------------------
GroupByShipCountry   1551396646  P    SQL_STORED_PROCEDURE                   

In summary there does not exist any evidence that the server is
aware of any information not contained in the collection of t-sql
statements that define it. In fact the server does not exhibit
any information about three of the four columns (Cnt, MinFrt, MaxFrt)
that are displayed when it is executed. This leaves us in a very unsteady
state. At the very most the server is performing an operation which is
not exposed to us to interact with, at the very least it is performing an
operation that it is itself unaware of. This situation makes it impossible
to look to the server for any characterization of the procedure towards
answering the question of what this procedure really is. It is up to us
to infer what the procedure is. With the consequence that we can not
rely on the server to either characterize what the procedure really is
nor depend on the server to guarantee the integrity of the operation.
This void in the server must be filled.

The unit test

Is it fair to say that the failure of the server necessitates a unit test?
We can more clearly see the answer to this question if we can clearly
see just what is the nature of a unit test that clarifies just what this
procedure is. An insightful perspective on the unit test is offered in
the excellent article:

'Using unit and functional tests in the development process'
by Jeff Canna
www.ibm.com/developerworks/library/j-test.html

The author writes:

'Many times the development of a system is likened to the building of a
house. Unit testing is analogous to a building inspector visiting a
house's construction site. He is focused on the various internal systems
of the house, the foundation, framing, electrical, plumbing, and so on.
He ensures (tests) that the parts of the house will work correctly and
safely, that is, meet the building code.'

He goes on to say:

'Unit tests tell a developer that the code is doing things right;
functional tests tell a developer that the code is doing the right things.'

The author is clearly expressing the idea that a unit test is something
the goes toward confirming something that is of a very basic nature.
Something that is the basis on which any other functional tests are
based on. So what sort of unit can we use to confirm the 'the code is doing
things right'? And will it answer the question of what the procedure really
is?
The sort of unit test we are looking for is approached by the insight offered
in another excellent article:

'Close those Loopholes - Testing Stored Procedures'
by Alex Kuznetsov and Alex Styler
www.simple-talk.com/content/print.aspx?article=426

With reference to a procedure very similar to dbo.GroupByShipCountry the
authors write:

'Suppose you need to write a unit test for that stored procedure.
You might want to verify that your result set has correct structure
(correct column names and types)...'

And further:

'When you set up your unit test, your result set (or result sets, if your
stored procedure returns several ones) is saved into an XML file. All the
information necessary for further comparison is saved column names, column
types, and the data...When you run your unit test, the result set is
compared against the saved XML file. Everything is compared - column names,
column types, and all the data.'

The idea of verifying that the 'result set has correct structure
(correct column names and types).' What the authors refer to as
'correct structure' is what we are really searching for. A test for the
correct structure is exactly what the server lacks. And what is this
structure that is based on pairs of columns and their datatypes? For it
is the columns of this structure that are displayed by executing the
procedure:

ShipCountry Cnt MinFrt MaxFrt 
----------- --- ------ ------

which are based on the structure of column/datatype:

ShipCountry:varchar, Cnt:Integer, MinFrt:Money, MaxFrt:Money

It is these pairs that define a specific type, a specific type of table.
And this simple concept allows us to finally answer the question of
what the stored procedure really is: a table. The unit test is a test
confirming the specific type of the table returned. What the server calls
a 'result set' (and not a 'table') is based on the fact that there does
not exist the concept of a type for a table. There does not exist a way
for the server to distinguish one table from another. It is the non-existence
of this concept in the server that necessitates this most basic unit (test)
being made the responsibility of the user. This is the huge hole in sql
that must be filled by the user, a role that the database should perform
is lost and with it the integrity that it should offer.

In part II we show and describe a database system that takes the
definition of a stored procedure beyond the statements that define it.

Basic Anatomy of Sql Server, part I 
What is a stored procedure?

Basic Anatomy of Sql Server, part II 
The unit test as part of the database.

Basic Anatomy of Sql Server, part III 
What does deferred name resolution really mean?

Some related articles of interest:
All tables are typed variables 
A table as a parameter
Reusable procedures
Monday, November 05, 2007

Basic Anatomy of Sql Server, part I

No comments:

Beyond Sql

Blog Archive

Links

About Me