SQLTeam.com | Weblogs | Forums

Matching name/address data


#1

I have 2 tables with names and addresses in and I would like to try to run a match between the tables to try and get unique and correct values. What is the best way to go about doing this ?


#2

Matching names can be tricky as you may have something like Dave versus David or Bob versus Robert. Addresses also my very between PO Box and P.O. Box, Apartment versus Apt

However in general use a select with a join will work joining on name and address if that is what you want.


#3

Thanks for that.
Can you post some example code for this please


#4

The join would work, of course, but it might be more performant to use the INTERSECT operator:[code]select name, address1, address2, city, state, postalcode
from MyTable1

intersect

select name, address1, address2, city, state, postalcode
from MyTable2[/code]Hey! I got to use one of those funky operators that I never think of using...Woo-hoo!!


#5

@stephen_hendricks, I forgot about that one. :slightly_smiling:


#6

We create a table with the [pair of] IDs of the matches, and a "score" for how good the match is.

INSERT INTO MyMatchTable
SELECT T1.ID, T2.ID, [MatchMethod] = 1 -- Exact match, all columns
FROM MyTable1 AS T1
     JOIN MyTable2 AS T2
         ON T1.name = T2.name
        AND T1.address1 = T2.address2
        ... etc.

INSERT INTO MyMatchTable
SELECT T1.ID, T2.ID, [MatchMethod] = 2 -- Address variation
FROM MyTable1 AS T1
     JOIN MyTable2 AS T2
         ON T1.name = T2.name
        AND T1.address1 <> T2.address2
        AND REPLACE(T1.address1,'Apt', 'Apartment')   = REPLACE(T2.address1,'Apt', 'Apartment')
        ... etc.

Some MatchMethod values (e.g. Exact Match) the code can act on - e.g. synchronise the two IDs as-one, others a human has to "approve" via the APP.

Telephone number is often a good one to match addresses on - e.g. if all-but-one address lines match, but phone number is the same. However, we then found that there was an agency used by lots of people, so they all, legitimately, had different addresses but the same (agency) phone number ...

Synchronising addresses is a big, messy, inexact job ... if you've got a lot to do, or its a frequently repeated operation, it would probably be better to buy-in a tool designed for the job. We spent a LOT of time fine tuning out SELECT statements. The first 80% was easy ... the last 5% was impossible, without purpose built software.