Is PARSENAME safe for general parsing and splitting?

Kristen · April 15, 2016, 9:11am

I see PARSENAME recommend for splitting strings from time to time, and I've always avoided it worrying about edge-conditions. What are your thoughts on using it?

The sort of thing I see is where a user wants to split "ABC-123-XYZ-456" and is asking for help trying to use CHARINDEX(), RIGHT(), LEFT(), SUBSTRING(), RTrim(), LTrim() and probably STUFF() as well!. Solutions using those functions are often very complex.

So instead PARSENAME can be made to do the job by change "-" into ".":

SELECT	PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 4),	-- "ABC"
	PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 3),	-- "123"
	PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 2),	-- "XYZ"
	PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 1)	-- "456"

which works a treat.

The DOCs say that PARSENAME takes first parameter of type SYSNAME.So that's anything up to 128 characters (i.e. nvarchar(128) ), which is clearly fine for most jobs (but other String handling functions such as CHARINDEX() will handle larger strings so, in the event that the definition of the column changes, massively, in the future other string functions would cope but anything depending on SYSNAME wouldn't. Barring a column that was very wide this seems an unlikely thing to need to worry about, but there is just that doubt in my mind that I might trip over this.

Then the DOCs say that the return type is NCHAR - so I suppose there is a potential issue with trailing spaces, but I can't see that I've ever seen a problem with that either ... but the result might need an RTrim()

Then there is the return type of NULL:

"PARSENAME returns NULL if one of the following conditions is true:
Either object_name or object_piece is NULL.
A syntax error occurs."

If object_name is NULL PARSENAME returns NULL, similarly if the requests object_piece is missing - that sounds fine to me too.

The DOCs also say that "A syntax error occurs when the requested object part has a length of 0 and is not a valid Microsoft SQL Server identifier. A zero-length object name renders the complete qualified name as not valid."

I'm not sure what to make of this as most tests I have done to check for edge-conditions that match this do NOT fail. However, a blank object_part (rather than a missing object_part) can muck things up.

If there are 5, or more, parts then the result is NULL. Someone who just wanted the "right-most part" might trip over that.

SELECT	PARSENAME('555.444.333.222.111', 1)	-- NULL

If the right-most part is blank the result is NULL. This looks like a hazard to me. If the rightmost part is blank selecting ANY part will return NULL. This is ONLY the case for the rightmost part. If the rightmost part is a space that is OK and is returned as a space.

SELECT	PARSENAME('444.333.222.', 4),	-- NULL
	PARSENAME('444.333.222.', 3),	-- NULL
	PARSENAME('444.333.222.', 2),	-- NULL
	PARSENAME('444.333.222.', 1)	-- NULL

The bit that bothers me the most is conformance to "is not a valid Microsoft SQL Server identifier". That is defined (if I read it right??) as

The first character must be a letter or "_", "@", "#"
Subsequent characters are Letters, Decimal numbers, "_", "@", "$"
The identifier must not be a Transact-SQL reserved word
Embedded spaces or special characters are not allowed.
Supplementary characters are not allowed.

All my tests on PARSENAME suggest that most of the normal SQL Identifier rules do not apply ... but SOME DO.

PARSENAME('444.333.222.111') works just fine, even though the object_parts start a non-letter.
PARSENAME('5.4.3.2.1') returns NULL (too many parts)
PARSENAME('4.3.2.') returns NULL because last part is blank - regardless of whether any of the other parts are blank, or missing, or not.
PARSENAME('...1') is fine. It returns "1" for the first part, and NULL for the others. If any of the other parts are present they return values, regardless of whether any of the other parts are blank - except the last past ("1") which must be present

If ANY of the object_parts start with "[" or "]" (but are not, also, a valid Identifier) then ALL object_parts return NULL. Any object_part which is delimited with [xxx] is returned WITHOUT the square-brackets (so they are "silently lost" in the process)

It is OK to have ANY object_part as a blank string, delimited by "[]" - so
PARSENAME('[].[].[].[]') is fine - all parts are returned as NULL and any that have actual values are returned correctly, including PARSENAME('[4].[3].[2].[]')

Doubled-up, embedded, close square brackets are required (to escape them) but NOT required for embedded open square brackets. Pretty sure this is "normal behaviour" as QUOTENAME('x[y') gives "[x[y]" whereas QUOTENAME('x]y') gives "[x]]y]".

There are probably other "reserve characters" that upset things - and I see that as a hazzard as such characters might creep into the user-data being parsed. However I can't see that reserve words pose any problem (but they would not be consider valid as SQL Identifiers)

PARSENAME('SELECT.FROM.WHERE.ORDER BY') splits into the 4 words, as expected.

JamesK · April 15, 2016, 2:14pm

I NEVER use PARSENAME to split strings, for the reasons you mentioned. The documentation makes it clear that the intended purpose is to extract the pieces from the four-part naming convention.

Splitting strings is so easy, you just have to wait for it to be 2016.
Oh! Wait!! This is 2016, but I am still on 2008R2

Kristen · April 15, 2016, 3:54pm

Thanks, not seen that STRING_SPLIT(string, separator) was imminent ... all those man-years of optimisation of Splitter Functions thrown away in an instant ...

But for this job that sort of splitter will split to Rows when Columns is (I think?) more desirable. Heck .. for the job of splitting to columns I have Code that will mechanically generate the CHARINDEX "stuff" to split a delimited, multi-column, value. I used mechanically-generated because the resulting code is horrid! for example for a 4-part split on "-" delimiter I get:

SELECT
  [Part_1] = substring(MyObjectName + '-', 0 + 1, charindex('-', MyObjectName + '-', 0 + 1) - 0 - 1 )
, [Part_2] = substring(MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1, charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) -  charindex('-', MyObjectName + '-') - 1 )
, [Part_3] = substring(MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) + 1, charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) + 1) -  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) - 1 )
, [Part_4] = substring(MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) + 1) + 1, charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) + 1) + 1) -  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-',  charindex('-', MyObjectName + '-') + 1) + 1) - 1 )
FROM
(
	SELECT	[MyObjectName] = '444-333-222-111'
) AS X

JamesK · April 15, 2016, 4:27pm

My mechanical when I don't want to use a string splitter function is like this. As a bonus, it correctly handles the case where there are fewer than expected number of tokens.

DECLARE @x VARCHAR(64) = '444-333-222-111'

SELECT
	[Part_1] = LEFT(a,CHARINDEX('-',a+'-')-1),
	[Part_2] = LEFT(b,CHARINDEX('-',b+'-')-1),
	[Part_3] = LEFT(c,CHARINDEX('-',c+'-')-1),
	[Part_4] = LEFT(d,CHARINDEX('-',d+'-')-1)
	
FROM
	( VALUES (@x) ) A(a)
	CROSS APPLY (VALUES (STUFF(a,1,CHARINDEX('-',a+'-'),''))) B(b)
	CROSS APPLY (VALUES (STUFF(b,1,CHARINDEX('-',b+'-'),''))) C(c)
	CROSS APPLY (VALUES (STUFF(c,1,CHARINDEX('-',c+'-'),''))) D(d);

Kristen · April 15, 2016, 6:20pm

You are cheating!! You are using CROSS APPLY which wasn't around when I wrote my mechanical-splitter-generator-code!!

Yours is a far better solution ... thanks for that, I'll throw my old one away

I do have a stone-age UPDATE version, which is slimmer than my SELECT, but I suppose that too would benefit from a CROSS APPLY in these modern times.

DECLARE	@I1	int,
	@I2	int,
	@I3	int,
	@I4	int
----------
UPDATE U
SET
	  @I1 = CHARINDEX('-', MySourceColumn + '-')
	, MyDestinationColumn1 = NullIf(RTrim(LEFT(MySourceColumn, @I1-1)), '')
	, @I2 =  NullIf(CHARINDEX('-', MySourceColumn + '-', @I1+1), 0)
	, MyDestinationColumn2 = NullIf(RTrim(SUBSTRING(MySourceColumn, @I1+1, @I2-@I1-1)), '')
	, @I3 =  NullIf(CHARINDEX('-', MySourceColumn + '-', @I2+1), 0)
	, MyDestinationColumn3 = NullIf(RTrim(SUBSTRING(MySourceColumn, @I2+1, @I3-@I2-1)), '')
	, @I4 =  NullIf(CHARINDEX('-', MySourceColumn + '-', @I3+1), 0)
	, MyDestinationColumn4 = NullIf(RTrim(SUBSTRING(MySourceColumn, @I3+1, @I4-@I3-1)), '')
FROM	dbo.MyTable AS U