Extracting Data From a String

Hello.

Here is the string I'm working with:
1 08/22/2024 9 875 300 20 423471282341 21 08/21/2024 22 S 16TH ST SAFEWAY PHOENIX AZ 23 548667 64 20:45:46 0

Using this, I am able to extract the first date in the string (08/22/2024)
Case When ah.TokenString Like '%[01][0-9]/[0123][0-9]/2[0-9][0-9]%' Then SUBSTRING(ah.TokenString, PATINDEX('%[01][0-9]/[0123][0-9]/2[0-9][0-9]%',ah.TokenString),10) Else '' End

How would I extract the date between the 21 and the 22? Although the date is always between those two anchors, the location isn't always consistent.

/*
If the string always follows the same pattern 
and your dates are always in the same ordinal position split by ' '
you can use this method to return the fields you need based on "field number".
*/
SELECT *
FROM 
(
  SELECT t.*
       , ROW_NUMBER() OVER (PARTITION BY t.id ORDER BY (SELECT NULL)) AS field
       , v.*
  FROM test t
  CROSS APPLY STRING_SPLIT(t.string, ' ') v
) splt
WHERE splt.field IN (2, 9);

Right now SPLIT_STRING() only returns the column value except on Azure SQL where it also returns ordinal I think. This column may come to on-prem SQL Server later, in which case you'd no longer need the ROW_NUMBER() to assign a field number.

There's an alternative method, it's a bit clunky but works with the example you provided:

;WITH cte as(
SELECT t.id, convert(xml,'<x><z>' + replace(t.string,' ','</z><z>') + '</z></x>') x
FROM test t)
SELECT c.id, try_convert(date,z.z.value('(./text())[1]','varchar(200)')) FROM cte c
cross apply c.x.nodes('x/z') z(z)
where try_convert(date,z.z.value('(./text())[1]','varchar(200)')) is not null
and z.z.value('(./text())[1]','varchar(200)') LIKE '%/%/%'

The general idea is to add XML elements to the string so that they can be extracted via XML methods in SQL Server. This generally avoids ambiguity in extracting parts of a string, and allows for easy access to multiples of the same type of value.

SQL Server XQuery support has some limitations, but can also do a lot:

There might be a more elegant/efficient way to make this work with your data.

1 Like

This uses T-SQL with SUBSTRING, no string split or XML, and based on the markers (' 21 ' and ' 22 ') that you specified:


DROP TABLE IF EXISTS #data;
CREATE TABLE #data ( TokenString varchar(8000) NULL );
INSERT INTO #data VALUES
    ('1 08/22/2024 9 875 300 20 423471282341 21 08/21/2024 22 S 16TH ST SAFEWAY PHOENIX AZ 23 548667 64 20:45:46 0');

SELECT ah.TokenString, ca1.first_date, ca2.second_date
FROM #data ah
CROSS APPLY (
    SELECT Case When ah.TokenString Like '%[01][0-9]/[0123][0-9]/2[0-9][0-9]%' 
        Then SUBSTRING(ah.TokenString, PATINDEX('%[01][0-9]/[0123][0-9]/2[0-9][0-9]%',ah.TokenString),10) Else '' End AS first_date,
        CHARINDEX(' 21 ', TokenString) AS second_data_prefix, 
        CHARINDEX(' 22 ', TokenString) AS second_data_suffix
) AS ca1
CROSS APPLY (
    SELECT CASE WHEN second_data_prefix > 0 AND second_data_suffix > 0 
        THEN SUBSTRING(TokenString, second_data_prefix + 4, 
            second_data_suffix - second_data_prefix - 4)
        ELSE '' END AS second_date
) AS ca2

Thank you all for responding. I used ScottPletcher's as it was the most efficient.

CASE
WHEN ah.TokenString LIKE '%21 [01][0-9]/[0123][0-9]/2[0-9][0-9]% 22%'
THEN SUBSTRING(
ah.TokenString,
PATINDEX('%21 [01][0-9]/[0123][0-9]/2[0-9][0-9]% 22%', ah.TokenString) + 3,
10
)
ELSE ''
END

tinyurl .com/ ynyuwnps