Hi,
I have downloaded the Australian G-NAF Dataset and was hoping that someone here was familiar with it and how to validate my customer file against G-NAF File, I thought I could build a matchkey based on address and postcode information but unfortunately this only gave me a 10% match rate when the G-NAF file is almost all of Australia 14.5 million records. I have created a table from the 30 odd tables which make up the G-NAF DATsets condensing down to the list below under G-NAF File listing
the fields I need to match against are as follows,
G-NAF File
number_first
street_name
street_type_code
locality_name
state_abbreviation
postcode
client file
address_line_1
address_line_2
address_line_3
state
postcode
I just need to run my client file against the G-NAF and append the matched information but if we could incorporate the wildcard to allow for certain mis-spelling but making sure the postcode filed is not wildcarded, so how do I run a query against each of the fields from my client file to the G-NAF file without merging fields?
hopefully someone has experience of this!!
many thanks in advance
Sean
Please post a good ser of sample data from both source if possible? Unless its confidential or maybe obfuscated data
please please explain what this means?
`
your note
so how do I run a query against each of the fields from my client file to the G-NAF file without merging fields?
`
this may help
- fuzzy-matching-using-t-sql
or this may help
- [Matching records from two tables - the JOIN ... inner join left join right join etc
hope it helps
Hi,
please see samples below,
client file
Address Line 1 |
Suburb |
State |
Postcode |
55 Park Avenue |
ROSEVILLE |
NSW |
2069 |
47 KALIMNA STREET |
CARRUM |
VIC |
3197 |
29 Walker St |
REDFERN |
NSW |
2016 |
11 Balfour Pl |
NOBLE PARK NORTH |
VIC |
3174 |
41 Cooper St. |
ALEXANDRA |
VIC |
3714 |
3/10 Crown Avenue |
MORDIALLOC |
VIC |
3195 |
18 Southern Rd |
HEIDELBERG HEIGHTS |
VIC |
3081 |
6 Corsican St |
FRANKSTON NORTH |
VIC |
3200 |
1 Huntingdale Cl |
CRANBOURNE |
VIC |
3977 |
8 Carrington Cresent |
LEUMEAH |
NSW |
2560 |
4a Oyster Ponit Road |
BANORA POINT |
NSW |
2486 |
PO Box 13 |
OLINDA |
VIC |
3095 |
2 Acton Street |
HURLSTONE PARK |
NSW |
2193 |
17 Graham Dr |
LANDSBOROUGH |
QLD |
4550 |
17 Wyndella St |
LAUNCESTON |
TAS |
7250 |
7 Greenwood Street |
NEWCOMB |
VIC |
3219 |
PO Box 146 |
ERSKINEVILLE |
NSW |
2043 |
1 Rope St |
DUNDAS VALLEY |
NSW |
2117 |
PO Box 139 |
BUDERIM |
QLD |
4556 |
31 Moorgate St |
MACGREGOR |
QLD |
4109 |
25 Hallows St |
GLEN WAVERLEY |
VIC |
3150 |
25 Hayward Rd |
LAKE MACDONALD |
QLD |
4563 |
3 Narrabri Crt |
GAVEN |
QLD |
4211 |
64 Shaftesbury St |
TARRAGINDI |
QLD |
4121 |
PO Box 196 |
CAMMERAY |
NSW |
2062 |
8/2 Lindman Street |
DROUIN |
VIC |
3818 |
PO Box 18 |
WAROONA |
WA |
6215 |
G_NAF File
NUMBER_FIRST |
STREET_NAME |
STREET_TYPE_CODE |
LOCALITY_NAME |
STATE_ABBREVIATION |
POSTCODE |
NULL |
DEVONCOURT |
ROAD |
DRILLHAM |
QLD |
4424 |
2 |
KENTIA |
STREET |
CARSELDINE |
QLD |
4034 |
42 |
PLEASANT |
DRIVE |
REDBANK PLAINS |
QLD |
4301 |
56 |
PLEASANT |
DRIVE |
REDBANK PLAINS |
QLD |
4301 |
NULL |
KALLIWA BEND |
ROAD |
MINGO |
QLD |
4625 |
3 |
RAILWAY |
AVENUE |
THE GEMFIELDS |
QLD |
4702 |
13 |
LAKE WEYBA |
DRIVE |
NOOSAVILLE |
QLD |
4566 |
23 |
CONDAMINE |
STREET |
SIPPY DOWNS |
QLD |
4556 |
28 |
CONDAMINE |
STREET |
SIPPY DOWNS |
QLD |
4556 |
31 |
TEEMANGUM |
STREET |
CURRUMBIN |
QLD |
4223 |
31 |
TEEMANGUM |
STREET |
CURRUMBIN |
QLD |
4223 |
159 |
STANLEY |
STREET |
TOWNSVILLE CITY |
QLD |
4810 |
3277 |
BEAUDESERT NERANG |
ROAD |
BIDDADDABA |
QLD |
4275 |
2 |
TANGO |
STREET |
CABOOLTURE |
QLD |
4510 |
102 |
QUEEN |
STREET |
SOUTHPORT |
QLD |
4215 |
1521 |
IPSWICH BOONAH |
ROAD |
PEAK CROSSING |
QLD |
4306 |
1531 |
IPSWICH BOONAH |
ROAD |
PEAK CROSSING |
QLD |
4306 |
NULL |
WELLS STATION |
ROAD |
BIARRA |
QLD |
4313 |
27 |
NARACOORTE |
PLACE |
PARKINSON |
QLD |
4115 |
27 |
WAVELL |
DRIVE |
TINAROO |
QLD |
4872 |
NULL |
BUNGEWORGORAI |
LANE |
DARGAL ROAD |
QLD |
4455 |
56 |
CHARLES |
STREET |
AITKENVALE |
QLD |
4814 |
406 |
MOUNTAIN VIEW |
ROAD |
MALENY |
QLD |
4552 |
13 |
LAKE WEYBA |
DRIVE |
NOOSAVILLE |
QLD |
4566 |
15 |
WAKEHAM |
STREET |
KALLANGUR |
QLD |
4503 |
3 |
WAKEHAM |
STREET |
KALLANGUR |
QLD |
4503 |
NULL |
FLINDERS |
HIGHWAY |
HUGHENDEN |
QLD |
4821 |
NULL |
KIRK RIVER |
ROAD |
RAVENSWOOD |
QLD |
4816 |
NULL |
KIRK RIVER |
ROAD |
RAVENSWOOD |
QLD |
4816 |
14 |
BUTTERNUT |
CIRCUIT |
THORNLANDS |
QLD |
4164 |
13 |
SUTTOR |
STREET |
NEBO |
QLD |
4742 |
8 |
MOONAH |
STREET |
BOULIA |
QLD |
4829 |
appreciate any help with this.. you will notice on the client file some descriptions for Road or street have be abbreviated so need either soundex function or fuzzy match
thanks
Sean
I merged the fields into a new table from a view I created to make a matchkey but it didn't work, so need to run each element of the client address against each field in the G-NAF file using either soundex or fuzzy logic, please see new post!
thanks
sean
Sorry forgot to paste postcode for G-NAF file but is present
thanks
sean
Will have a look and thanks one quick question I'm merging the G-NAF file number_first, street_name & street_type_code into a varchar but getting error message
"Error converting data type varchar to numeric." why would that be as Varchar is best for alphanumeric and my data has no decimal places?
cheer
sean
If any of 3 columns is numeric
Then throws that error
Solution
Cast ( columnnumeric as varchar)
+
Column123varchar
YES STREET_NUMER IS NUMERIC,
SO THE CODE GOES, Cast ([STREET_NAME] as varchar) what am I missing?
sorry been away from SQL over a year now so a little rusty, appreciate your help!!
Cast street number as varchar
Not
Street name
Hope I am right
thank but getting error message
"Msg 156, Level 15, State 1, Line 15
Incorrect syntax near the keyword 'as'."
Cast street_number as varchar
Cast ( streetnumber as varchar)
Looks like you are a lot rusty
,
tell me about it, start new job next week which is all SQL, hopefully will come flooding back once I get going!!
still not working, field is number_first, going made
/****** Script for SelectTopNRows command from SSMS ******/
SELECT TOP (1000) [NUMBER_FIRST]
,[STREET_NAME]
,[STREET_TYPE_CODE]
,[LOCALITY_NAME]
,[STATE_ABBREVIATION]
,[POSTCODE]
,[ADDRESS_LINE_1]
,[JAMM_NUMBER]
FROM [AUS].[dbo].[AUS_MERGED_ADDRESS_FIELDS]
Cast (number_first as varchar)
Msg 102, Level 15, State 1, Line 12
Incorrect syntax near 'number_first'.
Select cast( column as varchar) ,
Column1 ,
Column2 ,
Column3
From table
Make sure the commas are there
Can we chat somewhere live
It will be very quick to help you
I am not sure I understand you
ok help us help you by presenting the data not as embedded table but the following DDL and DML way
create table #client(AddressLine1 varchar(50), suburb varchar(50),
State varchar(50), Postcode int)
insert into #client
select '55 Park Avenue', 'ROSEVILLE', 'NSW', 2069 union
--etc
create table #gnaf(NUMBER_FIRST int , STREET_NAME varchar(50),
STREET_TYPE_CODE varchar(50), LOCALITY_NAME varchar(50),
STATE_ABBREVIATIONvarchar(50))
insert into #gnaf
--same here as same as client
this way we do not need to do the above for all of your data set.
What version of SQL Server will you be dealing with. 2017 has this NEAR function.
Also look into the Levenshtein
function that some have used in the past.
https://www.sqlteam.com/forums/topic.asp?TOPIC_ID=66781
Also maybe TSQL is not your solution? Could you leverage other programming languages to do this for you? Python? c#