Best Practice generating a delimited / concatenated list using FOR XML PATH

I have read that there is a length issue using FOR XML PATH but I have not been able to reproduce it (and maybe it was related to the Column Display Length Limit in SSMS ...)

I typically see this posted as

SELECT	P.Col1, P.Col2, ...
	, STUFF(
		(
			SELECT	',' + MyValue
			FROM	MyChildTable AS C
			WHERE	C.MyKey = P.MyKey
			ORDER BY MyValue
			FOR XML PATH('')
		)
	        , 1, 1, '') AS MyValueList
FROM	MyParentTable AS P

however that will "escape" XML charracters ">", "<", "&". The solution to that is to change

			FOR XML PATH('')
		)
to:
		FOR XML PATH(''), TYPE
	).value('.', 'varchar(max)')	-- NOTE: "'value" is case sensitive!!

I've tried all sorts of tests and I cannot get this to fail on length - I concatenated 1,000,000 rows of 7-character data and both methods worked fine

Other comments:

If using a space in the delimiter, i.e. ", " instead of just ",", then change:

	        , 1, 1, '') AS MyValueList
to:
	        , 1, 2, '') AS MyValueList

FOR XML PATH will omit any value which is NULL. To preserve NULL values change them to a blank string, or "(blank)" or similar. Change:

			SELECT	',' + MyValue
to:
			SELECT	',' + COALESCE(MyValue, '(blank)')

blanks are included though, and if they are not wanted (and NULL is also not wanted) then change

			SELECT	',' + MyValue
to:
			SELECT	',' + NullIf(MyValue, '')

or filter them out in the WHERE clause

I did wonder if it might be possible to use an XML directive to chop-off the leading comma (instead of using STUFF), any ideas?

Any other thoughts anyone has please?

P.S. In some cases it might be more useful written as a CROSS APPLY

SELECT	P.Col1, P.Col2, ...
	, C.MyValueList
FROM	MyParentTable AS P
	CROSS APPLY
	(
		SELECT	STUFF(
			(
				SELECT	',' + MyValue
				FROM	MyChildTable AS C
				WHERE	C.MyKey = P.MyKey
				ORDER BY MyValue
				FOR XML PATH(''), TYPE
--			NOTE: "'value" is case sensitive!!
			).value('.', 'varchar(max)')
		        , 1, 1, '') AS MyValueList
	)
1 Like

One thing that I usually do when trying to generate multiple concatenated strings based on another column is to use a DISTINCT on the controlling column. I have not measured it, but in my mind that would avoid SQL Server having to do some extra work.

CREATE TABLE #tmp(id INT, col1 VARCHAR(32));
INSERT INTO #tmp VALUES
	(1,'a'),(1,'b'),(1,'c'),
	(2,'x'),(2,'y'),(2,'z');
	
SELECT
	a.id,
	b.cols
FROM
	(SELECT DISTINCT id FROM #tmp) a ---<- THIS
	CROSS APPLY
	(
		SELECT ',' + b.col1
		FROM #tmp b
		WHERE b.id = a.id
		FOR XML PATH('')
	) b(cols);
	
DROP TABLE #tmp;

I've seen that come up a few times. Personally I can't imagine (that's bound to bite me in the behind!!) querying the table, itself, for the delimited list, but rather I would expect to be querying a Child Table - and in that situation the need would not arise as the outer query would be on a Parent Table which would provide a unique Parent ID/Key

Of course it is possible that the Parent Table might have dups, but "on average" I expect not.

I'm also interested in the syntax (which I also see, quite commonly on CTEs) of naming the columns in the Alias

CROSS APPLY
	(
		...
	) b(cols)

i.e. "b(cols)" in this case. As written I am not sure that there is anywhere where an alias results-column name can be put? but in practice the FOR XML will need one or both of ").value('.', 'varchar(max)')" and "STUFF" at which point it could be written as

SELECT
	a.id,
	b.cols
FROM	(SELECT DISTINCT id FROM #tmp) a
	CROSS APPLY
	(
		SELECT STUFF(
		(
			SELECT	',' + b.col1
			FROM	#tmp b
			WHERE	b.id = a.id
			FOR XML PATH(''), TYPE
		).value('.', 'varchar(max)')	-- NOTE: "'value" is case sensitive!!
		, 1, 1, '') AS [cols]
	) b

Anyway, my point was that I think the risk of the subsequent column name list being mis-aligned with the actual SELECT column list is unacceptable - disassociating one-from-the-other and putting some space between them is a cause of bugs during maintenance - much the same as using ORDER BY 8 rather than ORDER BY [MyResultColumnName] ... or using a CURSOR where the FETCH is repeated twice, and the Column List variable assignment is remote from the SELECT Column List :frowning:

I'd be interested in other people's views; mine usually starts from Defensive Programming as I am sure you are sick of hearing!