I see PARSENAME recommend for splitting strings from time to time, and I've always avoided it worrying about edge-conditions. What are your thoughts on using it?
The sort of thing I see is where a user wants to split "ABC-123-XYZ-456" and is asking for help trying to use CHARINDEX(), RIGHT(), LEFT(), SUBSTRING(), RTrim(), LTrim() and probably STUFF() as well!. Solutions using those functions are often very complex.
So instead PARSENAME can be made to do the job by change "-" into ".":
SELECT PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 4), -- "ABC"
PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 3), -- "123"
PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 2), -- "XYZ"
PARSENAME(REPLACE('ABC-123-XYZ-456', '-', '.'), 1) -- "456"
which works a treat.
The DOCs say that PARSENAME takes first parameter of type SYSNAME.So that's anything up to 128 characters (i.e. nvarchar(128) ), which is clearly fine for most jobs (but other String handling functions such as CHARINDEX() will handle larger strings so, in the event that the definition of the column changes, massively, in the future other string functions would cope but anything depending on SYSNAME wouldn't. Barring a column that was very wide this seems an unlikely thing to need to worry about, but there is just that doubt in my mind that I might trip over this.
Then the DOCs say that the return type is NCHAR - so I suppose there is a potential issue with trailing spaces, but I can't see that I've ever seen a problem with that either ... but the result might need an RTrim()
Then there is the return type of NULL:
"PARSENAME returns NULL if one of the following conditions is true:
Either object_name or object_piece is NULL.
A syntax error occurs."
If object_name is NULL PARSENAME returns NULL, similarly if the requests object_piece is missing - that sounds fine to me too.
The DOCs also say that "A syntax error occurs when the requested object part has a length of 0 and is not a valid Microsoft SQL Server identifier. A zero-length object name renders the complete qualified name as not valid."
I'm not sure what to make of this as most tests I have done to check for edge-conditions that match this do NOT fail. However, a blank object_part (rather than a missing object_part) can muck things up.
If there are 5, or more, parts then the result is NULL. Someone who just wanted the "right-most part" might trip over that.
SELECT PARSENAME('555.444.333.222.111', 1) -- NULL
If the right-most part is blank the result is NULL. This looks like a hazard to me. If the rightmost part is blank selecting ANY part will return NULL. This is ONLY the case for the rightmost part. If the rightmost part is a space that is OK and is returned as a space.
SELECT PARSENAME('444.333.222.', 4), -- NULL
PARSENAME('444.333.222.', 3), -- NULL
PARSENAME('444.333.222.', 2), -- NULL
PARSENAME('444.333.222.', 1) -- NULL
The bit that bothers me the most is conformance to "is not a valid Microsoft SQL Server identifier". That is defined (if I read it right??) as
- The first character must be a letter or "_", "@", "#"
- Subsequent characters are Letters, Decimal numbers, "_", "@", "$"
- The identifier must not be a Transact-SQL reserved word
- Embedded spaces or special characters are not allowed.
- Supplementary characters are not allowed.
All my tests on PARSENAME suggest that most of the normal SQL Identifier rules do not apply ... but SOME DO.
PARSENAME('444.333.222.111') works just fine, even though the object_parts start a non-letter.
PARSENAME('5.4.3.2.1') returns NULL (too many parts)
PARSENAME('4.3.2.') returns NULL because last part is blank - regardless of whether any of the other parts are blank, or missing, or not.
PARSENAME('...1') is fine. It returns "1" for the first part, and NULL for the others. If any of the other parts are present they return values, regardless of whether any of the other parts are blank - except the last past ("1") which must be present
If ANY of the object_parts start with "[" or "]" (but are not, also, a valid Identifier) then ALL object_parts return NULL. Any object_part which is delimited with [xxx] is returned WITHOUT the square-brackets (so they are "silently lost" in the process)
It is OK to have ANY object_part as a blank string, delimited by "[]" - so
PARSENAME('[].[].[].[]') is fine - all parts are returned as NULL and any that have actual values are returned correctly, including PARSENAME('[4].[3].[2].[]')
Doubled-up, embedded, close square brackets are required (to escape them) but NOT required for embedded open square brackets. Pretty sure this is "normal behaviour" as QUOTENAME('x[y') gives "[x[y]" whereas QUOTENAME('x]y') gives "[x]]y]".
There are probably other "reserve characters" that upset things - and I see that as a hazzard as such characters might creep into the user-data being parsed. However I can't see that reserve words pose any problem (but they would not be consider valid as SQL Identifiers)
PARSENAME('SELECT.FROM.WHERE.ORDER BY') splits into the 4 words, as expected.