First word of address field

Hi All,

I have a simple select query which draws in data to be used for a reconciliation with another group of data which I will be matching based on First Name, Last Name, Telephone Number etc.

An additional field I could use to match records is ADDRESS. Obviously, the formatting and styling of something as cumbersome as an address is rarely likely to match so I am just looking to draw in the very first word of the address. It's not much but it's something.

So both '16, Beaufort Street' and '16 Beaufort St' should both yield '16'.

How do I code this in SQL?

Pseudocode:

SELECT FIRST, LAST, TEL, xxxxx AS FirstWordOfAddress FROM CONTACTS

In the pseudocode 'xxxxx' is a formula to grab the first part of an address before a space or comma.

In Excel for example I use the formula:

=LEFT(D2,MIN(IFERROR(SEARCH(" ",D2),200),IFERROR(SEARCH(",",D2),200))-1)
(where D2 is the address)

The formula basically says:

  • look for the position of the first space in the address (and if there isn't one, be 200)
  • look for the position of the first comma in the address (and if there isn't one, be 200)
  • take the minimum of these two values
  • take the left of the address one short of that value

Is this at all doable with SQL?

Thanks!

you have cut the string from the start to where the first word ends ..

how do you identify where it ends = does it end in space comma or other characters

example

DECLARE @test NVARCHAR(255)
SET @test = 'First Second'

SELECT SUBSTRING(@test,1,(CHARINDEX(' ',@test + ' ')-1))
1 Like

Thanks for your reply.

The end of the first word can either be COMMA or SPACE - whichever comes first.

The other problem I have is that I am using a frontend to run these queries which does not allow usage of DECLARE, SET, UPDATE etc. You can really only run fairly simple select queries.

So far I am making some progress along the lines of (please excuse formatting):

select

iif(charindex(',',addressline1)=0 OR charindex(',',addressline1) IS NULL, NULL, left(addressline1,charindex(',',addressline1)-1)),

iif(charindex(' ',addressline1)=0 OR charindex(' ',addressline1) IS NULL, NULL, left(addressline1,charindex(' ',addressline1)-1))

from contact

I guess the next step would be another IIF to see which string is shortest (and not null).

It's not elegant though!

Probably also worth mentioning that some addresses may be null or may not include a space or comma (unlikely but possible).

When I read your question I would suggest to say "No".

As a data professional you should say to the business: sorry, I cannot do this as I don't have any matching field I can use to give reliable data. What you are trying to archive is a long road whitout satisfying result. This is why master data is important and a customer should have a master key every program should be able to report.

However, you can use SELECT CASE WHEN ... THEN
WHEN ... THEN
ELSE
...
END to archive what you want.

2 Likes

you can do it somewhere else and call the results from front end


SELECT 
    test_data.address,
    LEFT(address, address_end) AS first_word_of_address
FROM ( /*dbo.your_table_name_here*/
    SELECT CAST('16, Beaufort Street' AS varchar(500)) AS address
    UNION ALL
    SELECT '16 Beaufort St' 
) AS test_data
CROSS APPLY (
    SELECT REPLACE(address, ',', ' ') AS address_prep1
) AS ca1
CROSS APPLY (
    SELECT CASE WHEN CHARINDEX(' ', address_prep1) = 0 THEN 200 
        ELSE CHARINDEX(' ', address_prep1) - 1 END AS address_end
) AS ca2

so this type of parsing strings within SQL is fraught with dangers. You have shown us here only one variation of "bad data". Which leads me to believe the application side of things allows people to enter all kinds of bad data.

  1. Fix the application that allows this kind of entry.

if #1 is not viable...

  1. If it allows 16, then pretty sure it allows endless variety of bad data. 16#, 16%,16;, 16: etc. You get my drift. Since it can allow the above, whatever kind of clean up query you write could possibly not catch all of them. The effort might feel heroic and cool to parse strings with ninja like logic but be careful.

There are commercially available products for address cleansing that I would be much comfortable with

I finally solved it as follows (just to close down this thread)

select  
addressline1, 
IIF(iif(charindex(',',addressline1)=0 OR charindex(',',addressline1) IS NULL, NULL, left(addressline1,charindex(',',addressline1)-1)) IS NULL, iif(charindex(' ',addressline1)=0 OR charindex(' ',addressline1) IS NULL, NULL, left(addressline1,charindex(' ',addressline1)-1)), 
(IIF(iif(charindex(' ',addressline1)=0 OR charindex(' ',addressline1) IS NULL, NULL, left(addressline1,charindex(' ',addressline1)-1)) IS NULL, iif(charindex(',',addressline1)=0 OR charindex(',',addressline1) IS NULL, NULL, left(addressline1,charindex(',',addressline1)-1)), 
(IIF(LEN(iif(charindex(',',addressline1)=0 OR charindex(',',addressline1) IS NULL, NULL, left(addressline1,charindex(',',addressline1)-1)))<LEN(iif(charindex(' ',addressline1)=0 OR charindex(' ',addressline1) IS NULL, NULL, left(addressline1,charindex(' ',addressline1)-1))), iif(charindex(',',addressline1)=0 OR charindex(',',addressline1) IS NULL, NULL, left(addressline1,charindex(',',addressline1)-1)) , iif(charindex(' ',addressline1)=0 OR charindex(' ',addressline1) IS NULL, NULL, left(addressline1,charindex(' ',addressline1)-1)))))))
FROM CONTACT

In fact I just came up with a more elegant solution.

Pseudocde:

  • Replace commas with spaces and add a trailing space (so there's always a space)
  • Take the left of that before the first space

#OutsideTheBox :grinning:

You mean exactly as in my code above !?

No. The front end doesn't allow cross apply.

@Tombo - I agree with the others... this form of "matching" is fraught with issues but I also understand that ya gotta to something sometimes.

With that in mind, here's another possibility IF your front-end allows PATINDEX(). I make no performance claims on this one simply because I've not tested it for performance, but it may get you out of the proverbial woods.

--===== Building a test table here.  This is NOT a part of the solution.
   DROP TABLE IF EXISTS #TestTable;
GO
 SELECT *
   INTO #TestTable
   FROM (SELECT 1,CAST('16, Beaufort Street' AS varchar(500))
         UNION ALL
         SELECT 2,'16 Beaufort St'
         UNION ALL
         SELECT 3,'JustOneWord'
         UNION ALL
         SELECT 4,NULL
         UNION ALL
         SELECT 5,''      --Empty string
         UNION ALL
         SELECT 6,'   '   --Only spaces
        )d(RowNum,Address)
;
--===== Possible solution
 SELECT  *
        ,FirstWord = ISNULL(LEFT(Address,PATINDEX('%[, ]%',ISNULL(Address+' ',' '))-1),'')
   FROM #TestTable
;

Results:
image