SQLTeam.com | Weblogs | Forums

Creating A Pattern search query


#1

Mates,
I have an requirement . Initially i thouht it be easy and can be done with a simple like query but i now under stand its not as simple as i thought.

I have a Lookup table containing around 12000 records. (2 fields ,code, description both strings, codes are unique)
I need a query to find the records that match my search text. . I wrote it with a Like search with search text.It doesnt works.

For eg : when i search for a term "steel " i get 900 records.
But when i search for this term "900 MM DIA STEEL ROUND BAR IS:1971" no records is retrieved.
I was expecting same 900 records atleast for the reason word STEEL is there. my result could have also more records ie if any of the substring matches it will be there in the result...but what i get now is the exact search result. Can any help to write better query?


#2

so in the case above do you want the query to find any one of those strings?
so bring records that have the words 900 or MM or DIA or STEEL etc?


#3

It would help if you posted your query and "WHY" it doesn't work (Error? Wrong data? If so an example of the expected results and why the results you got were wrong, would help) and then folk here can advise.


#4

You can try something like the following, where both the descriptions and the search phrase as split into individual words and then the individual words are matched... The higher the word match count, the better the match...

-- create a small test table...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL 
DROP TABLE #TestData;

CREATE TABLE #TestData (
	Code INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	Description VARCHAR(50) NOT NULL  
	);
INSERT #TestData(Description) VALUES
	('Selection of Tool Steels'),
	('Heat Treatment of Tool Steels'),
	('CPM - The Crucible Particle Metallurgy Process'),
	('CPM High Speed Steels'),
	('CPM High Speed Comparagraph'),
	('CPM High Speed Grindability Comparator'),
	('CPM Cold Work Tool Steels'),
	('CPM Tool Steel Comparagraph'),
	('CPM Welding Guidelines'),
	('Grinding Recommendations for Tool Steel'),
	('Plastics Mold and Tool Steel Comparagraph');

-- SELECT * FROM #TestData td;

Query the data...

--==================================================

DECLARE @SearchString VARCHAR(50) = 'CPM High Speed Comparagraph';

IF OBJECT_ID('tempdb..#SplitSearchString', 'U') IS NOT NULL 
DROP TABLE #SplitSearchString;

SELECT 
	sc.ItemNumber, sc.Item
	INTO #SplitSearchString
FROM 
	dbo.SplitCSVToTable8K(@SearchString, ' ') sc;

----------

SELECT 
	td.Code, 
	td.Description,
	WordsMatched = COUNT(*)
FROM 
	#TestData td
	CROSS APPLY dbo.SplitCSVToTable8K(td.Description, ' ') sc
	JOIN #SplitSearchString sss
		ON sc.Item = sss.Item
GROUP BY 
	td.Code,
	td.Description
ORDER BY 
	COUNT(*) DESC,
	td.code ASC;

--==================================================

The results...

Code        Description                                        WordsMatched
----------- -------------------------------------------------- ------------
5           CPM High Speed Comparagraph                        4
4           CPM High Speed Steels                              3
6           CPM High Speed Grindability Comparator             3
8           CPM Tool Steel Comparagraph                        2
3           CPM - The Crucible Particle Metallurgy Process     1
7           CPM Cold Work Tool Steels                          1
9           CPM Welding Guidelines                             1
11          Plastics Mold and Tool Steel Comparagraph          1

#5

We have a Cache table that does that. Any tables' column where we want that effect we use a trigger to split the data into individual words, and store them (as separate rows) in a Keyword Caching table (along with an indicator of the Source Table/Column). We can then efficiently search the Keyword table for specific words.

Words stored into the Keyword table are replaced according to an Keyword Alias table - that deals with word stemming and the like. So "Cars" is stored as "Car" (and maybe we decide to alias "Van" and "Vans" as "Car" too ...)

When the User needs to search for a phrase we split their phrase into individual words, pass those through the Keyword Alias table so that they too are stemmed/standardised, and then match those against the Keyword table - giving weight to the number of matches. We also use Soundex / StartsWith and stuff like that

We have Clients who put all their competitor product codes (and name-keywords of hot-selling items) into the Keyword Alias table, with their own equivalent word/product code, so that searches for those words also succeed.

Bit of a fag to set up in the first place ... we did quite a lot of work on a publicly available dictionary to create all the word stems, and then found that the predictive search we initially wrote gave suggestions based on words in the Dictionary ... rather than words that were actually used in the Keywords table ... with the result that we were offering rude words, and all sorts, as suggestions ... Oops!

Dunno if SQL Freetext search would be any good for this job? Most of what we do is specific words for people names / addresses, and also Product Names, and the bog standard tools seemed not to work well for us. Particularly where we wanted noise-words to be handled in a specific manner.

All that may not be of any relevance to the O/P though ...


#6

If I were doing this in in our database, I'd definitely store and index the split values rather than splitting them on the fly every time very time a search was made.The code I offered was just to illustrate the concept.
You do bring up a good point though... You can go WAY down the rabbit hole in terms of additional massaging of the split values to facilitate "fuzzy" matches.
That said, the longer I do this stuff, the less inclined I am to spend time idiot proofing solutions. Seems like, every time I add an idiot proofing measure, they bring in a better idiot.


#7

That's my happy-place :slight_smile:


#8

Shhhh!, Be vewy vewy quite.
Elmer Fud

http://1.bp.blogspot.com/_OdCx7pACI9M/TTVLIUxhuVI/AAAAAAAAAKI/ANeGBTD8S4Q/s1600/Elmer+Fudd+pictures+looney+tunes-+Elmer+Fudd+Color+Sticker.jpg


#9

Hi Kristen,

Thanks alot for your reply.
I was telling the query was not working based on the count of matching results. query executed properly.

when i search for a term "steel " i get 900 records.
But when i search for this term "900 MM DIA STEEL ROUND BAR IS:1971" no records is retrieved.
I was expecting same 900 records atleast for the reason word STEEL is there. my result could have also more records ie if any of the substring matches it will be there in the result...


#10

Thanks alot for detailed reply


#11

Thank you friend.
My doubt is this where should i specify this OR condition which is best for performance.

either i can write in programming language syntax and form this query.

or in my database query i can split and form this needed type of query.
i was looking is there any other way than multiple OR ing of like conditions


#12

Yes, I read that, but without seeing the SQL for your query I have no idea how it works, and therefore why it doesn't work. Folk here would just be guessing ... Maybe you are using wildcards, maybe something else, could be anything in fact.