SQLTeam.com | Weblogs | Forums

Number of matching words between two columns


#1

Hi there

I've got two two columns each with short text descriptions, up to five or six words. I'd like to separate the words and count how many words match. The purpose is to rank relevancy match between the two. The higher the word match count, the higher the relevancy.

Column A
High grade diesel truck
Pick up truck
Four door sedan

Column B
Low grade diesel truck
Long haul truck
Motorcycle


#2

This article and its references are worth checking out http://www.sqlservercentral.com/articles/Tally+Table/142316/

Also:


#3

Are you looking for something along these lines?

IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL 
DROP TABLE #TestData;

CREATE TABLE #TestData (
	ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
	Desc_A VARCHAR(30) NOT NULL,
	Desc_B VARCHAR(30) NOT NULL
	);

INSERT #TestData (Desc_A, Desc_B) VALUES
	('High grade diesel truck', 'Low grade diesel truck'),
	('Pick up truck', 'Long haul truck'),
	('Four door sedan', 'Motorcycle');

--=======================================================

SELECT 
	td.ID, 
	td.Desc_A, 
	WordCount_A = COUNT(DISTINCT sca.ItemNumber),
	td.Desc_B,			
	WordCount_B = COUNT(DISTINCT scb.ItemNumber),
	MatchCount = COUNT(CASE WHEN sca.Item = scb.Item THEN 1 END)
FROM
	#TestData td
	CROSS APPLY dbo.SplitCSVToTable8K(td.Desc_A, ' ') sca
	CROSS APPLY dbo.SplitCSVToTable8K(td.Desc_B, ' ') scb
GROUP BY 
	td.ID, 
	td.Desc_A, 
	td.Desc_B
ORDER BY
	td.ID;

Results...

ID          Desc_A                         WordCount_A Desc_B                         WordCount_B MatchCount
----------- ------------------------------ ----------- ------------------------------ ----------- -----------
1           High grade diesel truck        4           Low grade diesel truck         4           3
2           Pick up truck                  3           Long haul truck                3           1
3           Four door sedan                3           Motorcycle                     1           0