Australian G-NAF Dataset HELP!

Hi,

I have downloaded the Australian G-NAF Dataset and was hoping that someone here was familiar with it and how to validate my customer file against G-NAF File, I thought I could build a matchkey based on address and postcode information but unfortunately this only gave me a 10% match rate when the G-NAF file is almost all of Australia 14.5 million records. I have created a table from the 30 odd tables which make up the G-NAF DATsets condensing down to the list below under G-NAF File listing

the fields I need to match against are as follows,

G-NAF File
number_first
street_name
street_type_code
locality_name
state_abbreviation
postcode

client file
address_line_1
address_line_2
address_line_3
state
postcode

I just need to run my client file against the G-NAF and append the matched information but if we could incorporate the wildcard to allow for certain mis-spelling but making sure the postcode filed is not wildcarded, so how do I run a query against each of the fields from my client file to the G-NAF file without merging fields?

hopefully someone has experience of this!!

many thanks in advance
Sean

Please post a good ser of sample data from both source if possible? Unless its confidential or maybe obfuscated data

please please explain what this means?
`

your note

so how do I run a query against each of the fields from my client file to the G-NAF file without merging fields?

`

this may help

  • fuzzy-matching-using-t-sql

or this may help

  • [Matching records from two tables - the JOIN ... inner join left join right join etc

hope it helps
:slight_smile: :slight_smile:

Hi,

please see samples below,

client file

Address Line 1 Suburb State Postcode
55 Park Avenue ROSEVILLE NSW 2069
47 KALIMNA STREET CARRUM VIC 3197
29 Walker St REDFERN NSW 2016
11 Balfour Pl NOBLE PARK NORTH VIC 3174
41 Cooper St. ALEXANDRA VIC 3714
3/10 Crown Avenue MORDIALLOC VIC 3195
18 Southern Rd HEIDELBERG HEIGHTS VIC 3081
6 Corsican St FRANKSTON NORTH VIC 3200
1 Huntingdale Cl CRANBOURNE VIC 3977
8 Carrington Cresent LEUMEAH NSW 2560
4a Oyster Ponit Road BANORA POINT NSW 2486
PO Box 13 OLINDA VIC 3095
2 Acton Street HURLSTONE PARK NSW 2193
17 Graham Dr LANDSBOROUGH QLD 4550
17 Wyndella St LAUNCESTON TAS 7250
7 Greenwood Street NEWCOMB VIC 3219
PO Box 146 ERSKINEVILLE NSW 2043
1 Rope St DUNDAS VALLEY NSW 2117
PO Box 139 BUDERIM QLD 4556
31 Moorgate St MACGREGOR QLD 4109
25 Hallows St GLEN WAVERLEY VIC 3150
25 Hayward Rd LAKE MACDONALD QLD 4563
3 Narrabri Crt GAVEN QLD 4211
64 Shaftesbury St TARRAGINDI QLD 4121
PO Box 196 CAMMERAY NSW 2062
8/2 Lindman Street DROUIN VIC 3818
PO Box 18 WAROONA WA 6215

G_NAF File

NUMBER_FIRST STREET_NAME STREET_TYPE_CODE LOCALITY_NAME STATE_ABBREVIATION POSTCODE
NULL DEVONCOURT ROAD DRILLHAM QLD 4424
2 KENTIA STREET CARSELDINE QLD 4034
42 PLEASANT DRIVE REDBANK PLAINS QLD 4301
56 PLEASANT DRIVE REDBANK PLAINS QLD 4301
NULL KALLIWA BEND ROAD MINGO QLD 4625
3 RAILWAY AVENUE THE GEMFIELDS QLD 4702
13 LAKE WEYBA DRIVE NOOSAVILLE QLD 4566
23 CONDAMINE STREET SIPPY DOWNS QLD 4556
28 CONDAMINE STREET SIPPY DOWNS QLD 4556
31 TEEMANGUM STREET CURRUMBIN QLD 4223
31 TEEMANGUM STREET CURRUMBIN QLD 4223
159 STANLEY STREET TOWNSVILLE CITY QLD 4810
3277 BEAUDESERT NERANG ROAD BIDDADDABA QLD 4275
2 TANGO STREET CABOOLTURE QLD 4510
102 QUEEN STREET SOUTHPORT QLD 4215
1521 IPSWICH BOONAH ROAD PEAK CROSSING QLD 4306
1531 IPSWICH BOONAH ROAD PEAK CROSSING QLD 4306
NULL WELLS STATION ROAD BIARRA QLD 4313
27 NARACOORTE PLACE PARKINSON QLD 4115
27 WAVELL DRIVE TINAROO QLD 4872
NULL BUNGEWORGORAI LANE DARGAL ROAD QLD 4455
56 CHARLES STREET AITKENVALE QLD 4814
406 MOUNTAIN VIEW ROAD MALENY QLD 4552
13 LAKE WEYBA DRIVE NOOSAVILLE QLD 4566
15 WAKEHAM STREET KALLANGUR QLD 4503
3 WAKEHAM STREET KALLANGUR QLD 4503
NULL FLINDERS HIGHWAY HUGHENDEN QLD 4821
NULL KIRK RIVER ROAD RAVENSWOOD QLD 4816
NULL KIRK RIVER ROAD RAVENSWOOD QLD 4816
14 BUTTERNUT CIRCUIT THORNLANDS QLD 4164
13 SUTTOR STREET NEBO QLD 4742
8 MOONAH STREET BOULIA QLD 4829

appreciate any help with this.. you will notice on the client file some descriptions for Road or street have be abbreviated so need either soundex function or fuzzy match

thanks
Sean

I merged the fields into a new table from a view I created to make a matchkey but it didn't work, so need to run each element of the client address against each field in the G-NAF file using either soundex or fuzzy logic, please see new post!

thanks
sean

Sorry forgot to paste postcode for G-NAF file but is present

thanks
sean

One idea For Matching..

Checksum ..

Hope it helps

Will have a look and thanks one quick question I'm merging the G-NAF file number_first, street_name & street_type_code into a varchar but getting error message
"Error converting data type varchar to numeric." why would that be as Varchar is best for alphanumeric and my data has no decimal places?

cheer
sean

If any of 3 columns is numeric
Then throws that error

Solution
Cast ( columnnumeric as varchar)
+
Column123varchar

YES STREET_NUMER IS NUMERIC,

SO THE CODE GOES, Cast ([STREET_NAME] as varchar) what am I missing?

sorry been away from SQL over a year now so a little rusty, appreciate your help!!

Cast street number as varchar

Not
Street name

Hope I am right

thank but getting error message

"Msg 156, Level 15, State 1, Line 15
Incorrect syntax near the keyword 'as'."

Cast street_number as varchar

Cast ( streetnumber as varchar)

Looks like you are a lot rusty
,:+1::grin:

tell me about it, start new job next week which is all SQL, hopefully will come flooding back once I get going!! :grinning:

still not working, field is number_first, going made

/****** Script for SelectTopNRows command from SSMS ******/
SELECT TOP (1000) [NUMBER_FIRST]
,[STREET_NAME]
,[STREET_TYPE_CODE]
,[LOCALITY_NAME]
,[STATE_ABBREVIATION]
,[POSTCODE]
,[ADDRESS_LINE_1]
,[JAMM_NUMBER]
FROM [AUS].[dbo].[AUS_MERGED_ADDRESS_FIELDS]

Cast (number_first as varchar) 

Msg 102, Level 15, State 1, Line 12
Incorrect syntax near 'number_first'.

Select cast( column as varchar) ,
Column1 ,
Column2 ,
Column3
From table

Make sure the commas are there

Can we chat somewhere live
It will be very quick to help you

I am not sure I understand you

ok help us help you by presenting the data not as embedded table but the following DDL and DML way

create table #client(AddressLine1 varchar(50), suburb varchar(50), 
State varchar(50), Postcode int)

insert into #client
select '55 Park Avenue', 'ROSEVILLE', 'NSW',  2069 union 
--etc
create table #gnaf(NUMBER_FIRST int , STREET_NAME varchar(50), 
STREET_TYPE_CODE varchar(50), LOCALITY_NAME  varchar(50), 
STATE_ABBREVIATIONvarchar(50))

insert into #gnaf
--same here as same as client

this way we do not need to do the above for all of your data set. :slight_smile:

What version of SQL Server will you be dealing with. 2017 has this NEAR function.
Also look into the Levenshtein function that some have used in the past.

https://www.sqlteam.com/forums/topic.asp?TOPIC_ID=66781

Also maybe TSQL is not your solution? Could you leverage other programming languages to do this for you? Python? c#