Dataphor SQL RAC (Relational Application Companion)

A site of hope for those looking for a true relational database system

Thursday, July 05, 2007

Dataphor - Create string intervals

This problem appeared on the forums:
'Compare two varchar arrays one character at time until different'

The original poster explains:
Some background info first. We have an ASP.NET Web Application that serves up a list of
names in a dynamic tree control. The nodes of the tree are generated on demand and loaded
into the tree dynamically. When this list of names become very big (and indeed it will
be), the delay takes too long. So to minimize the delay we will group the names into
dynamically created folders. Each folder will contain n number of names based on some
passed in limit (In my code below this is the variable @NthRow). This way we can mimimize
the number of nodes to fetch yet still support a large number of names.

The final output of our function must be a table of strings in the form of X - Y where X
is the starting name and Y is the ending name of the group.

For example, given the following names and @NthRow = 5

Baker, Ms. Jamie
Espinosa, Ms. Jean
Gonzales, Mr. Robert
Holland, Ms. Julia
Jimenez, Ms. Soni
Macdonald, Mr. Mickey
Miller, Ms. Jana
Noriega, Ms. Michelle
Owen, Ms. Alma
Ramirez, Ms. Stephanie
Salas, Ms. Jeanine
Sipes, Mr. Tyler
Tamayo, Ms. Laura
Timmer, Ms. Julie
Trevino, Ms. Yvonne
Young, Ms. Annia

Produce an output table of:

Ba-Ji represents the names Baker, Ms. Jamie thru Jimenez, Ms. Soni
Ma-Ra represents the names Macdonald, Mr. Mickey thru Ramirez, Ms. Stephanie
Sa-Tr represents the names Salas, Ms. Jeanine thru Trevino, Ms. Yvonne
Y-Y represents the name Young, Ms. Annia

This solution uses the D4 language of Dataphor and MS Sql Server 2005 as
the data repository,

Basically we want to take a table of names ordered by last name and divide it
into intervals of NthRow names. We want at least the 1st 2 characters of
of the starting and ending last names for each interval. We will allow that
if the last interval contains only 1 name we want to show a lesser number
of characters than other intervals.

Here is some sample data entered in Dataphor and stored in sql server:

create table MyNames
Name:String tags{Storage.Length = "50"},
row{1 ID,' Owen , Ms. Alma' Name},
row{2,'Ramirez, Ms. Stephanie'},
row{3,' Salas, Ms.  Jeanine'},
row{4,'  Sipes, Mr. Tyler'},
row{5,'Tamayo, Ms. Laura'},
row{6,'Timmer  , Ms. Julie'},
row{7,'  Trevino, Ms. Yvonne'},
row{8,'  Young, Ms. Annia'},
row{9,'Baker, Ms. Jamie'},
row{10,' Espinosa, Ms. Jean'},
row{11,'  Gonzales, Mr. Robert'},
row{12,'  Holland, Ms. Julia'},
row{13,'Jimenez, Ms. Soni'},
row{14,' Macdonald, Mr. Mickey'},
row{15,'   Miller, Ms. Jana'},
row{16,'Noriega, Ms. Michelle'}
into MyNames;    

Operator ArrayVarNames takes parameters of a 'table' with columns ID and Name,
NthRow and NChars which is the number of characters +1 to identity a name.
The operators returns a table with the interval information, the count of
names in the interval and the identifying characters.

create operator
result:=table of typeof(result){};
//Return a table variable with a sequence number (RowID) reflecting Lastname
//sorted in ascending order.
var T:=
       (MyNames add{Name.Split()[0].Trim() LastName}
          with {IgnoreUnsupported = 'true'}) order by {LastName})))
            {sequence+1 RowID,Name,LastName};
var MaxID:=Max(RowID from T); //Max RowID in table T.           
//NGroups is a table variable that uses NthRow and the numbers table to form
//intervals. For each interval (Index) we get the count (Cnt), the max RowID
//(MaxRowID) for the interval (we really are only interested in MaxRowID
//for the last interval).
var NGroups:=
                where num between 1 and MaxID
                 {num Index,((num-1)*NthRow)+1 From,num*NthRow To}
                    T with {IgnoreUnsupported = 'true'}
                      where RowID between From and To
                       group by {Index} add{Count() Cnt,Max(RowID) MaxRowID}
//LRow is a row variable with the max Index (MaxIndex), which is the max interval
//given data MyNames and NthRow, the count (Cnt) of the rows for the max Index
//and the max RowID for MyNames. Note that LRow uses the table (variable) NGroups.                        
var LRow:=    
           ( NGroups adorn{key{Index}} return 1 by {Index desc}
                {Index MaxIndex,MaxRowID,Cnt})[];
//We get NChars+1 characters of the Lastname for the starting (NFrom) RowID
//and ending RowID (NTo) for each interval by using RowID as a key to look
//up LastName in table T. We use some boolean logic to check if there is
//only 1 LastName in the last interval (MaxIndex). If true we only want
//the NChars character of LastName. We put the boolean logic directly in SubString.
//We also add the count (NCnt) for each interval.                       
numbers rename{num Index}
where Index between 1 and LRow.MaxIndex
 add{(NGroups adorn{key{Index}})[Index].Cnt NCnt} with {IgnoreUnsupported = 'true'}
 add{((Index-1)*NthRow)+1 NFrom,Index*NthRow NTo}
      SubString((T adorn{key{RowID}})[NFrom].LastName,0,
      (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))) CharFrom,
        SubString((T adorn{key{RowID}})[NTo].LastName,0,
      (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1))))),
      SubString((T adorn{key{RowID}})[LRow.MaxRowID].LastName,0,
      (NChars+ToInteger(not((Index=LRow.MaxIndex) and (LRow.Cnt=1)))))
        ) CharTo
     with {IgnoreUnsupported = 'true'} ;

For example given the original data:

select ArrayVarNames(MyNames,5,1);

Index NFrom NTo NCnt CharFrom CharTo
----- ----- --- ---- -------- ------
1     1     5   5    Ba       Ji   
2     6     10  5    Ma       Ra   
3     11    15  5    Sa       Tr   
4     16    20  1    Y        Y  

select ArrayVarNames(MyNames,6,2);

Index NFrom NTo NCnt CharFrom CharTo
----- ----- --- ---- -------- ------
1     1     6   6    Bak      Mac  
2     7     12  6    Mil      Sip  
3     13    18  4    Tam      You  

We can use the operator with any table by simply renaming the columns of
the table. Here we use the Orders table of the Sql Server Northwind database
to create intervals for customers by simply renaming the columns of the
Orders table to agree with those expected by ArrayVarNames:

select ArrayVarNames
 (Orders group by {CustomerID} add{Max(OrderID) ID}
            {ID,CustomerID Name},15,4);

Index NFrom NTo NCnt CharFrom CharTo
----- ----- --- ---- -------- ------
1     1     15  15   ALFKI    COMMI
2     16    30  15   CONSH    GOURL
3     31    45  15   GREAL    LILAS
4     46    60  15   LINOD    QUEEN
5     61    75  15   QUICK    THEBI
6     76    90  14   THECR    WOLZA

We can create a similar table using the Products table of the Northwind db
using ProductID and ProductName.

select ArrayVarNames
 (Products {ProductID ID,ProductName Name},10,30);
Index NFrom NTo NCnt CharFrom                      CharTo                          
----- ----- --- ---- ----------------------------- ------------------------------- 
1     1     10  10   Alice Mutton                  Chef Anton's Gumbo Mix          
2     11    20  10   Chocolade                     Grandma's Boysenberry Spread    
3     21    30  10   Gravad lax                    Jack's New England Clam Chowder 
4     31    40  10   Konbu                         Mishi Kobe Niku                 
5     41    50  10   Mozzarella di Giovanni        Queso Cabrales                  
6     51    60  10   Queso Manchego La Pastora     Scottish Longbreads             
7     61    70  10   Singaporean Hokkien Fried Mee Tofu                            
8     71    80  7    Tourtière                     Zaanse koeken                   
Hopefully examples like these will encourage you to explore Dataphor -:)

1 comment:

Anonymous said...

It's very effortless to find out any topic on web as compared to textbooks, as I found this piece of writing at this web site.

Feel free to visit my web page :: safe diets

About Me

My photo
Phoenix, Arizona, United States