I have 2 tables with names and addresses in and I would like to try to run a match between the tables to try and get unique and correct values. What is the best way to go about doing this ?
Matching names can be tricky as you may have something like Dave versus David or Bob versus Robert. Addresses also my very between PO Box and P.O. Box, Apartment versus Apt
However in general use a select with a join will work joining on name and address if that is what you want.
Thanks for that.
Can you post some example code for this please
The join would work, of course, but it might be more performant to use the INTERSECT operator:[code]select name, address1, address2, city, state, postalcode
select name, address1, address2, city, state, postalcode
from MyTable2[/code]Hey! I got to use one of those funky operators that I never think of using...Woo-hoo!!
We create a table with the [pair of] IDs of the matches, and a "score" for how good the match is.
INSERT INTO MyMatchTable SELECT T1.ID, T2.ID, [MatchMethod] = 1 -- Exact match, all columns FROM MyTable1 AS T1 JOIN MyTable2 AS T2 ON T1.name = T2.name AND T1.address1 = T2.address2 ... etc. INSERT INTO MyMatchTable SELECT T1.ID, T2.ID, [MatchMethod] = 2 -- Address variation FROM MyTable1 AS T1 JOIN MyTable2 AS T2 ON T1.name = T2.name AND T1.address1 <> T2.address2 AND REPLACE(T1.address1,'Apt', 'Apartment') = REPLACE(T2.address1,'Apt', 'Apartment') ... etc.
Some MatchMethod values (e.g. Exact Match) the code can act on - e.g. synchronise the two IDs as-one, others a human has to "approve" via the APP.
Telephone number is often a good one to match addresses on - e.g. if all-but-one address lines match, but phone number is the same. However, we then found that there was an agency used by lots of people, so they all, legitimately, had different addresses but the same (agency) phone number ...
Synchronising addresses is a big, messy, inexact job ... if you've got a lot to do, or its a frequently repeated operation, it would probably be better to buy-in a tool designed for the job. We spent a LOT of time fine tuning out SELECT statements. The first 80% was easy ... the last 5% was impossible, without purpose built software.