Not sure this is the best place for this question as it's not entirely SQL based...YET. I'm hoping to find a better place to start in finding the solution.
I have an issue where I'm supposed to find address data from an excel list in a given database.
Problem is that addresses don't match those in the database in every instance. Using LIKE only returns some lines as abbreviations, misspells, etc occur on both sides of the data.
I'm currently thinking the best course of action would be to strip both sets of data of all spaces and special characters, concatenating the street address and zip, and creating a primary key off of that.
So 123 East Wages Road #5 would just become 123EastWagesRoad5.
I understand I would still run into issues where the address was entered as 123 E. Wages Rd. Suite #5 or something like that but I'm hoping someone has a more comprehensive approach in finding the best solution.
Thank you!!
If you had no more than say 50K addresses, and you get about 50% hit rate on the first pass, I'd say it's not a bad start. I wouldn't suggest it as on ongoing process though, only something you'd spend less than a day on.
If you have >100K addresses and you need to match/validate/normalize them, buy an address validation service like MelissaData. You'll save a lot of time and it will comprehensively fix all kinds of problems you probably haven't even uncovered yet.
Rolling your own address validation is frustrating, painful, and usually a dead end. I wish I wasn't speaking from experience.
There are services that clean up addresses; Google has one, I know other companies have them too.
For a limited number of rows, some of them are free. Others aren't all that expensive. Personally I think it would be more efficient to do that than spend hundreds of your own hours trying to program your own corrections for addresses.
Thanks for the direction.