This is a solution using Dataphor to the problem in the post: comp.databases.ms-sqlserver May 9, 2008 'SQL Statement Problem : Separate long varchar into word seqments 35 chars long' http://tinyurl.com/42vty5 The op states the problem as: 'I have a table of names which can be very long. These names get printed on envelopes. The problem is the envelope can only hold 35 characters per line. I have to divide into 4 lines at most. So I need to separate these long varchars into segments, no longer than 35 characters but preserving whole words.' Data is stored in MS Sql Server 2005. Here is some sample data stored in table SMerge: -- Tsql DDL statements to create the SMerge table in a sql server database. create table SMerge (ID Int primary key,InStr varchar(250)) go insert SMerge values (1,'BIOLOGICAL SURVEY FOUNDATION OF ONTARIO HAPPY IF SOMEONE CAN HELP SOLVE THIS SQL PROBLEM') insert SMerge values (2,'I SURE HOPE I CAN GET THIS THING TO WORK RIGHT') insert SMerge values (3,' What will happen with a big string full of all kinds of names and addresses including some goobly goop. Will it hold up ?') This is a fairly simple iterative solution. It illustrates various dataphor programming capabilities including set based, cursor and procedural. I've relaxed the requirement of at most 4 lines. Given a maximum line size the batch iterates over each input string creating lines not exceeding the maximum size until all words in the string are assigned to a line. There is no limit to the number of lines allowed (although there certainly could be). // Start of batch if not ObjectExists("WTable") then Execute( "create session table WTable //Temporary table used in batch. { StrID:Integer, //Input string id. Str:String, //An individual string, ie. a word, from splitting. Seq:Integer, //Position/sequence of word from split. Interval:Integer nil {default nil}, //The line number the word will eventually belongs to. StrLen:Integer, //Length of word (Str). RunningSumStrLens:Integer nil, //Running sum of word lengths within a line. LenCheck:Boolean {default false}, //Whether or not a word (Str) has been assigned to a line. key{StrID,Seq} };" ) else Execute("delete WTable;"); var MaxLineLength:=35; //Max length of line desired. var LCheck:Boolean; // Holds boolean (t/f) of adding string to line, // whether line is within max line size. var MaxSeq:Integer; //Holds max Seq for ID and line (interval). // SMerge is the sql server table. var LCursorSqlTable:=cursor(SMerge order by {ID}); // Loop over all rows in the sql table splitting each input string (InStr). while LCursorSqlTable.Next() do begin // Error check. If for any ID the length of any word (Str), via Split, exceeds the max line // size then eliminate the ID from the result. This also insures no endless cursor loops. if exists( ToTable(LCursorSqlTable.Select().InStr.Split({' '}),'Str','Seq') where Length(Str)>MaxLineLength ) then continue; //Skips below code and gets another input string (InStr). // Insert the split string into work table WTable. Split operator splits using blank as // the delimiter. A sequence number (Seq) is also generated which is an integer indicating // the position (from 0 to N) from left to right of each word (Str) in the input string (InStr). // We also compute the length each word (Str). This is stored in column StrLen. insert ( ToTable(LCursorSqlTable.Select().InStr.Split({' '}),'Str','Seq') add{Length(Str) StrLen, LCursorSqlTable.Select().ID StrID, nil as Integer RunningSumStrLens} where (StrLen>0) //Eliminate extra spaces. We could do all sorts of other //checks and manipulations here as well. ) into WTable; var LInterval:=0; //Line(Interval) within an ID (InStr). var LRunningSumStrLens:Integer; MaxSeq:=-1; //Start current ID with Seq>-1, ie. include all words (Seq starts at 0). // Iterate over WTable. Get a running sum of word lengths. Each iteration represents a new // line based on comparing the running sum of word lengths to the max line size. Mark words // in rows of WTable as either true (part of new line) or false (to be assigned in a subsequent // iteration). As long as there are words to be assigned to a line the iteration (loop) // continues. The loop stops when all words for an ID have been optimally assigned to a // line (Interval). In other words loop stops when all values of LenCheck are true because // each word has been assigned to a line. while exists(WTable where (StrID=LCursorSqlTable.Select().ID) and (Seq>MaxSeq)) do begin LInterval:=LInterval+1 ; LRunningSumStrLens:=0; //Initialize running sum for word lengths to 0 for a new line. // Use WTable eliminating prior assigned words. We don't need these rows, we need rows // representing words that still need to be assigned to a line. Seq>MaxSeq gets these rows, // it gets all rows (words) for an ID that haven't yet been assigned to a line (Interval). var LCursor:= cursor( WTable where ((StrID=LCursorSqlTable.Select().ID) and (Seq>MaxSeq)) order by {StrID,Seq} capabilities {Navigable, Searchable, Updateable}); while LCursor.Next() do // For all remaining words that aren't yet assigned to a line(Interval) for the current ID // get a running sum of word lengths and check that the run value is <= max line size. begin // If it's the start of a new line we only want the running sum to be the length of the word, // if it's after the 1st word we add the previous word lengths, which includes a 1 for a space // between words, and the length of the current word. LRunningSumStrLens:= if LRunningSumStrLens=0 then LCursor.Select().StrLen else LRunningSumStrLens + 1 + LCursor.Select().StrLen; // Check if the word, as part of the running sum, is within the maximum line size. LCheck:=LRunningSumStrLens<=MaxLineLength ; // If the current word length, when added to the running sum of lengths, is greater than the // max line size there's no more point in staying within the LCursor loop. The line(Interval) // has been determined using prior words so break out of this loop and start forming a new // line(Interval) with as yet unassigned words (where LenCheck=false) with a new running sum. if not LCheck then break; // Update LenCheck to true. The current word fits on the current line as tested by comparing // the running sum (with the current word length) to the max line size. We can now ignore // this word (row) in assigning subsequent (unassigned) words to a line. Remember that the // default value of LenCheck is false. So we don't have to update words (set LenCheck=false) // as false since this is the default. And we'll get those words (where LenCheck=false) on // the next iteration of the loop. LCursor.Update(row{LRunningSumStrLens RunningSumStrLens,LCheck LenCheck,LInterval Interval}); end; //Lcursor, looping over WTable for an ID until all words are assigned to a line. // Find max Seq for the current ID where the word has been assigned to a line (LenCheck=true). // The next definition for the cursor for WTable starts with Seq = MaxSeq+1, ie. the rows with // Seq values greater than the max Seq value of assigned words. Interval is also used here // to help make the query more efficient (a possible better alternative would be to keep // track of Seq). MaxSeq:=Max(Seq from WTable where (StrID=LCursorSqlTable.Select().ID) and (Interval=LInterval) and (LenCheck)); end; //WTable end; //LCursorSqlTable. // Here is a simple check that this batch assigned each word to a line for each ID. All // LenCheck values must be true or else there is a failure somewhere in batch. Of course we // could more thoroughly check exactly where (ID and line) failure(s) occurred in the query. // We could also put extensive error handling throughout batch. Just so you know ☺ if not All(LenCheck from WTable) then exit; // Use a query to build the output strings (ALine) by concatenating (using the Concat operator) // the words (Str) within IDs and lines (Interval) and ordering the concatenation by Seq //(the position determined by Split). We also include the line length for each // concatenated line (ALine) of words within an ID. select WTable add{' ' del} adorn {key{Seq}} group by {StrID,Interval} add{ Concat(Str,del order by {Seq}) ALine} add{Length(ALine) LineLength} order by {StrID,Interval}; // End of batch Result: StrID Interval ALine LineLength ----- -------- ----------------------------------- ---------- 1 1 BIOLOGICAL SURVEY FOUNDATION OF 31 1 2 ONTARIO HAPPY IF SOMEONE CAN HELP 33 1 3 SOLVE THIS SQL PROBLEM 22 2 1 I SURE HOPE I CAN GET THIS THING TO 35 2 2 WORK RIGHT 10 3 1 What will happen with a big string 34 3 2 full of all kinds of names and 30 3 3 addresses including some goobly 31 3 4 goop. Will it hold up? 22 Here is what WTable looks like before concatenation. Note the running sum of string lengths (RunningSumStrLens) within each line (Interval) for each ID (StrID). select WTable order by {StrID,Interval,Seq}; StrID Str Seq Interval StrLen RunningSumStrLens LenCheck ----- ---------- --- -------- ------ ----------------- -------- 1 BIOLOGICAL 0 1 10 10 True 1 SURVEY 1 1 6 17 True 1 FOUNDATION 2 1 10 28 True 1 OF 3 1 2 31 True 1 ONTARIO 4 2 7 7 True 1 HAPPY 5 2 5 13 True 1 IF 6 2 2 16 True 1 SOMEONE 7 2 7 24 True 1 CAN 8 2 3 28 True 1 HELP 9 2 4 33 True 1 SOLVE 10 3 5 5 True 1 THIS 11 3 4 10 True 1 SQL 12 3 3 14 True 1 PROBLEM 13 3 7 22 True 2 I 0 1 1 1 True . . 3 hold 23 4 4 18 True 3 up? 24 4 3 22 True How will this batch work for larger strings? Lets take the whole question of the post (cut and pasted from the op on google). Here, through dataphor, we assign the sql server table SMerge just one row of data consisting of the post data as the input string. As part of the insert we remove carrige returns and line feeds using the Replace function. SMerge:= table { row{1 ID, "I'm having a really tough time with a SQL statement and I am wondering if someone is able to help out or point me in the right direction. I have a table of names which can be very long. These names get printed on envelopes. The problem is the envelope can only hold 35 characters per line. I have to divide into 4 lines at most. So I need to separate these long varchars into segments, no longer than 35 characters but preserving whole words. So far my approach has been to take a LEFT segment, REVERSE it, find the first space with CHARINDEX and use it to calculate how many characters to take in a SUBBSTRING. Here's an example of what I have been trying. I can find the first two segments, but then it starts to get confusing. Can anyone suggest a better approach? Am I going to be able to do this in SQL? I appreciate any help." InStr} } // Replace carrige returns and line feeds with blanks in InStr. redefine {InStr:=Replace(Replace(InStr,ASCII(list(Byte){13}) ,' '),ASCII(list(Byte){10}) ,' ')} ; Executing the batch with a maximum line size of 80 (MaxLineLength:=80) we get: StrID Interval ALine LineLength ----- -------- -------------------------------------------------------------------------------- ---------- 1 1 I'm having a really tough time with a SQL statement and I am wondering if 73 1 2 someone is able to help out or point me in the right direction. I have a table 78 1 3 of names which can be very long. These names get printed on envelopes. The 74 1 4 problem is the envelope can only hold 35 characters per line. I have to divide 78 1 5 into 4 lines at most. So I need to separate these long varchars into segments, 78 1 6 no longer than 35 characters but preserving whole words. So far my approach has 79 1 7 been to take a LEFT segment, REVERSE it, find the first space with CHARINDEX and 80 1 8 use it to calculate how many characters to take in a SUBBSTRING. Here's an 74 1 9 example of what I have been trying. I can find the first two segments, but then 79 1 10 it starts to get confusing. Can anyone suggest a better approach? Am I going to 79 1 11 be able to do this in SQL? I appreciate any help. 49 Executing the batch with a maximum line size of 70 (MaxLineLength:=70) we get: StrID Interval ALine LineLength ----- -------- ---------------------------------------------------------------------- ---------- 1 1 I'm having a really tough time with a SQL statement and I am wondering 70 1 2 if someone is able to help out or point me in the right direction. I 68 1 3 have a table of names which can be very long. These names get printed 69 1 4 on envelopes. The problem is the envelope can only hold 35 characters 69 1 5 per line. I have to divide into 4 lines at most. So I need to separate 70 1 6 these long varchars into segments, no longer than 35 characters but 67 1 7 preserving whole words. So far my approach has been to take a LEFT 66 1 8 segment, REVERSE it, find the first space with CHARINDEX and use it to 70 1 9 calculate how many characters to take in a SUBBSTRING. Here's an 64 1 10 example of what I have been trying. I can find the first two segments, 70 1 11 but then it starts to get confusing. Can anyone suggest a better 64 1 12 approach? Am I going to be able to do this in SQL? I appreciate any 67 1 13 help. 5 Seems to work ok ☺ Note that with this simple framework it would be easy to add all kinds of editing niceties. Left justification, centering etc. could be easily added. It would also be easy to edit/check the individual words, possibly eliminating some given specific rules. Note that we could easily make the batch an operator (stored procedure) returning the work table or a virtual table. If you have any ideas on further features or what would be kewl for this framework please let me know ☺ You can check out dataphor at: www.dataphor.org
Dataphor SQL RAC (Relational Application Companion)
A site of hope for those looking for a true relational database system
- a one-one requirement constraint with dataphor (1)
- anatomy of sql server part I - what is a stored procedure (1)
- anatomy of sql server part II - the unit test as part of the database (1)
- anatomy of sql server part III - what does deferred name resolution really mean (1)
- censoring sql posts (1)
- creating an opposite constraint in dataphor (1)
- dataphor (2)
- Dataphor (7)
- dataphor # 13 a table as a parameter (1)
- dataphor - download and start working with it (1)
- dataphor - fixed sized word segments (1)
- dataphor # 10 sql mythology (1)
- dataphor # 11 string differences (1)
- dataphor # 12 trimming a string (1)
- dataphor # 14 sql the meaning of Update..From (1)
- dataphor # 15 views with substance (1)
- dataphor # 16 inclusive vs exclusive solutions (1)
- dataphor # 17 a visual look at ranking queries (1)
- dataphor # 18 data scrubbing using lists (1)
- dataphor # 19 create intervals over strings (1)
- dataphor # 20 browsing an sql window (1)
- dataphor # 21 an example of relational division (1)
- dataphor # 22 reusable procedures (1)
- dataphor # 23 repley to Michel (1)
- dataphor # 24 basics of the table type (1)
- dataphor # 25 extending the dense rank function (1)
- dataphor # 26 query a hierarchy with explode (1)
- dataphor # 27 combine strings with Split and Concat (1)
- dataphor # 28 constants and variables or sql and D4 (1)
- dataphor # 29 another example of relational division (1)
- dataphor #1 introduction (1)
- dataphor #2 splitting strings (1)
- dataphor #3 string concatenation (1)
- dataphor #4 comment (1)
- dataphor #5 comment (1)
- dataphor #6 formal definition (1)
- dataphor #7 sql: table this (1)
- dataphor #8 list to table (1)
- dataphor #9 table constraints (1)
- dataphor creating lists in a query (1)
- extracting numbers from a string with dataphor (1)
- jeff modens dynamic crosstabs for sql server (1)
- linq to sql the what and why (1)
- linq to sql as a window of opportunity to sql users (1)
- linq to sql should be important to sql users (1)
- linq to sql vs. older 4GL attempts (1)
- listing missing table item (1)
- Multiple cascade paths to the same table (1)
- RAC (4)
- RAC #1 comment (1)
- RAC #2 example (1)
- RAC #3 finding the Nth number in a string (1)
- RAC #4 Sql Server 2005 ranking functions vs. Rac ranking (1)
- sorting a delimited string by its numerical string parts (1)
- sql an example of extreme implicit conversions (1)
- sql can't handle complicated cascading updates (1)
- sql CTE should be a variable not a value (1)
- sql dense rank for identifying consecutive runs (1)
- sql is there really a table variable (1)
- sql ranking functions explained by relational types (1)
- sql server triggers are best set based (1)
- sql the idea of using substring to simulate lists (1)
- sql the undefined trigger in Sql Server (1)
- sql vs relational on tables (1)
- sql what the sql CTE covers up (1)
- types and procedures (1)
Wednesday, May 14, 2008
Dataphor - Splitting a string into fixed sized word segments
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment