SQLTeam.com | Weblogs | Forums

Parse Word Function


#1

Hi Guys,

I'm in the process of trying to convert a ParseWord function which is written in VB to a server side function. There is some code in my function that doesn't look like it will convert. I am hoping someone can offer some suggestions for a replacement. The basics of what I am looking for is a function that will extract X word from a phrase passed to the function. The function should look for common delimiters between words such as space, comma, semi colon and the like. As an example. ParseWord(@Phrase,2) would mean extract the second word from @Phrase. Any suggestions you can offer are appreciated


#2

Relatively hard to do in SQL if you want fairly broad delimiters - such as "Any non alphabetic character". A list which is delimited by, say, "comma" is easy (i.e. where the list and delimiter are consistent). Widen that to "Split all words" where the delimiter then starts to become "A space, hyphen, slash, ..." and "ignoring all other punctuation" is difficult. The other problem is where you have a double-space between words - you could easily take care of that in VB, but its a pain to handle in SQL - without careful handling a double-space (I'll use "." to illustrate the spaces) such as WORD1..WORD2 will mean that the 2nd word is "blank" and the 3rd word is "WORD2"

Sure, you can do it like VB walking through the string character-by-character, but that's a really bad way to build a SQL function which is then applied to multiple rows

There are a number of ways to tackle this, I'll be interested in what others have to say, but I would use a string splitter function (which will split the @Phrase into a pseudo-table, one row per word, and then just add WHERE SplitItemNo = 2

A String Splitter Function is very useful in SQL (there isn't one built in until SQL 2016, and that one has severe limitations), and as you are likely to reuse it generically its worth installing one that is efficient such as Jeff Moden's function called DelimitedSplit8K

Where we want a free-word search on a text field - e.g. a Product Name where we want Customers to be able to find products by any word in the Product Name - we split the product name into individual words, put them in a Keyword table, and then search the Keyword table (which is very fast) for the "Product Find" function.

I wrote a detailed description of that for someone the other day, so if it turns out that is the type of problem you are trying to solve let me know and I'll get a link.


#3

Thanks for the reply. My requirement is very basic actually but I would like it to be robust enough that it can be reused. I'm search the words that make up a card name. If the first word is A An or the ignore it and go to the next word. Treat hyphenated words as one word handle commas and such. Once I find a word that isn't A An or The return the word.


#4

Shane,

The simple splitter that Kristen speaks of is located at the following URL:
http://www.sqlservercentral.com/articles/Tally+Table/72993/

Since you've mentioned "noise words", that DelimitedSplit8K function may not be nearly easy enough. You may want to look into the FULL TEXT SEARCH (FTS) features of SQL Server for that type of thing. It's not as robust as some of the expensive FTS systems out there but it may get you by.


#5

Jeff,

Thank you for providing the link and for your advice.