Extract part of a string with inconsistent length

Hi All,

I want to extract the size dimension from the SKU fields which comes in different formats. I tried SPLIT_PART and CHARINDEX/Substr but not able to get it to work for all different formats.

Example SKU formats -

1.C11544-BLK-2XL|AC|PQRS|Baseball Cap
2.C11544-BLK-S-M|AC|PQRS|Baseball Cap
3. K45602-NA-1|XZ|XYSUA|Pant

Expected Output

  1. 2XL
  2. S-M
  3. 1

Note- The letters for colour is not consistently 3 letter. Same for Style. Only the format is static - SKU|XX|XXXX and so on.

Database - Snowflake

Quick response is greatly appreciated!

As this is a SQL Server site, I doubt many know snowflake.
This is just basic string handling. With t-sql:

/* *** Test Data *** */
CREATE TABLE #t
(
	YourCol varchar(500) NOT NULL
	PRIMARY KEY
);
INSERT INTO #t
VALUES ('C11544-BLK-2XL|AC|PQRS|Baseball Cap')
	,('C11544-BLK-S-M|AC|PQRS|Baseball Cap')
	,('K45602-NA-1|XZ|XYSUA|Pant');
/* *** End Test Data *** */

SELECT SUBSTRING(X2.SecondBit, CHARINDEX('-', X2.SecondBit) + 1, 255) As Result
FROM #t T
	CROSS APPLY (VALUES(LEFT(T.YourCol, CHARINDEX('|', T.YourCol) - 1))) X1 (FirstBit)
	CROSS APPLY (VALUES(SUBSTRING(X1.FirstBit, CHARINDEX('-', X1.FirstBit) + 1, 255))) X2 (SecondBit);
2 Likes

hi

hope this helps

Please try it on your "Data Set all sorts of variations" to see if this solution works

SELECT   
    SKU,  
    SUBSTRING(PreSKU, Dash2 + 1, LEN(PreSKU) - Dash2) AS SizeDimension  
FROM   
(  
    SELECT   
        SKU,  
        -- Extract the part before the first pipe  
        SUBSTRING(SKU, 1, CHARINDEX('|', SKU) - 1) AS PreSKU,  
        -- Find the first dash in the pre-pipe string  
        CHARINDEX('-', SUBSTRING(SKU, 1, CHARINDEX('|', SKU) - 1)) AS Dash1,  
        -- Find the second dash using the first dash as starting point  
        CHARINDEX('-', SUBSTRING(SKU, 1, CHARINDEX('|', SKU) - 1), CHARINDEX('-', SUBSTRING(SKU, 1, CHARINDEX('|', SKU) - 1)) + 1) AS Dash2  
    FROM #SKUs  
) AS Derived;