I have 2 tables (ID, Name / both Key Columns) of 2 independent sistems that carry more or less the same information but using another nomenclature for the naming and different Identificators ID.
I need to get the best guess for what ID corresponds to the ID of the master system.
example:
Table1: ID = 100, Name = 'Red Green Purple'
Table2: ID = 200, Name = 'Green Purple Black'
I want that for ID = 100 to be returned ID, 66% as 2 out of the 3 words match that master ID (in case that there isn't a better match)
My approach is:
- Split the Name into its components (cross apply String_Split(name,' ')) for Table1
- Link the result to Table2: left outer join Table2 b on b.NAME like '%' + a.value + '%')
This gives me basically all the records I need to consider. The rest should be only countig and comparing in order to get the best guess for the correlation between the 2 ID's. Unfortunately I don't see how to do this step. It's a mapping process with not identical names.
Here is some sample data:
IF OBJECT_ID('tempdb..#tbl1') IS NOT NULL
DROP TABLE #tbl1
CREATE TABLE #tbl1 (id int, name nvarchar(30))
INSERT INTO #tbl1 (id, name) Values (100, 'red green purple')
INSERT INTO #tbl1 (id, name) Values (101, 'blue red')
INSERT INTO #tbl1 (id, name) Values (102, 'brown black')
INSERT INTO #tbl1 (id, name) Values (103, 'orange green')
IF OBJECT_ID('tempdb..#tbl2') IS NOT NULL
DROP TABLE #tbl2
CREATE TABLE #tbl2 (id int, name nvarchar(30))
INSERT INTO #tbl2 (id, name) Values (200, 'black green')
INSERT INTO #tbl2 (id, name) Values (201, 'red blue')
INSERT INTO #tbl2 (id, name) Values (202, 'red black')
INSERT INTO #tbl2 (id, name) Values (203, 'orange brown red green')
;with data as (
select id, name, Value
from #tbl1
cross apply String_Split(name,' ')),
match as (
select a.id, a.name, a.Value, b.name as master_name, b.id as master_ID
from data a
left outer join #tbl2 b on b.NAME like '%' + a.value + '%')
select * from match
The result I want for this sample wuld be this: (in case multiple records have the same match quality, I want all of them to be displayed)
ID Name Name_Master ID_Master Match
------------------------------------------------------------------
100 'Red Green Purple' 'Orange Brown Red Green' 203 66%
101 'Blue Red' 'Red Blue' 201 100%
102 'Brown Black' 'Orange Brown Red Green' 203 33%
102 'Brown Black' 'Black Green' 200 33%
102 'Brown Black' 'Red Black' 202 33%
103 'Orange Green' 'Orange Brown Red Green' 203 100%