Hi there
I've got two two columns each with short text descriptions, up to five or six words. I'd like to separate the words and count how many words match. The purpose is to rank relevancy match between the two. The higher the word match count, the higher the relevancy.
Column A
High grade diesel truck
Pick up truck
Four door sedan
Column B
Low grade diesel truck
Long haul truck
Motorcycle
This article and its references are worth checking out http://www.sqlservercentral.com/articles/Tally+Table/142316/
Also:
Are you looking for something along these lines?
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Desc_A VARCHAR(30) NOT NULL,
Desc_B VARCHAR(30) NOT NULL
);
INSERT #TestData (Desc_A, Desc_B) VALUES
('High grade diesel truck', 'Low grade diesel truck'),
('Pick up truck', 'Long haul truck'),
('Four door sedan', 'Motorcycle');
--=======================================================
SELECT
td.ID,
td.Desc_A,
WordCount_A = COUNT(DISTINCT sca.ItemNumber),
td.Desc_B,
WordCount_B = COUNT(DISTINCT scb.ItemNumber),
MatchCount = COUNT(CASE WHEN sca.Item = scb.Item THEN 1 END)
FROM
#TestData td
CROSS APPLY dbo.SplitCSVToTable8K(td.Desc_A, ' ') sca
CROSS APPLY dbo.SplitCSVToTable8K(td.Desc_B, ' ') scb
GROUP BY
td.ID,
td.Desc_A,
td.Desc_B
ORDER BY
td.ID;
Results...
ID Desc_A WordCount_A Desc_B WordCount_B MatchCount
----------- ------------------------------ ----------- ------------------------------ ----------- -----------
1 High grade diesel truck 4 Low grade diesel truck 4 3
2 Pick up truck 3 Long haul truck 3 1
3 Four door sedan 3 Motorcycle 1 0
1 Like