Is there a way to make this query more efficient

SergioM · April 29, 2021, 3:13pm

I have a pretty basic query that looks like this. It splits a cell-based on carriage return & collects the first 6 characters of a row if those characters are integers.

WITH one_note AS (
	SELECT CAST( [Note] AS NVARCHAR(MAX)) AS [Note]
	FROM [DB].[dbo].[bvc_Notes] bn
	WHERE 1=1
		AND [id] = 280
	)
SELECT LEFT( [Item], 6 ) AS [AdjustmentID]
FROM [DB].[dbo].[DelimitedSplit8K](
	REPLACE(REPLACE((SELECT TOP 1 [Note] FROM one_note), CHAR(13), ''), CHAR(10), ',')
	,',') asl
WHERE 1=1
	AND TRY_CAST( LEFT( [Item], 6 ) AS  INT ) IS NOT NULL

It relies on the DelimitedSplit8K from here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function

The data that it is splitting in the one field looks like this: https://postimg.cc/vDzCwg1h

The problem is that - at just 500 rows - it takes ~3 minutes to receive the query result. I get that this is unstructured data & that it can never be as efficient as structured data, but can it be made faster than it currently is? I'm using MSSQL 2012 & do not have access to newer stringsplit functions

jeffw8713 · April 29, 2021, 4:29pm

Without sample data and expected results - the best we can do is guess. What is the data type of column Note in the table bvc_Notes?

Here is a sample:

Declare @testTable Table (Note nvarchar(max));
 Insert Into @testTable (Note)
 Select concat('123456 This is first paragraph', char(13)+char(10), 'Second Paragraph',  char(13)+char(10), '222222 Third');

 Select *
   From @testTable              tt
  Cross Apply dbo.DelimitedSplit8K(tt.Note, char(10)) dsk
  Where dsk.Item Like '[0-9][0-9][0-9][0-9][0-9][0-9]%';

Using that sample - we should be able to modify yours to this:

SELECT AdjustmentID = left(asl.Item, 6)
  FROM db.dbo.bvc_notes  bn
 CROSS APPLY (VALUES (CAST(bn.Note AS nvarchar(max)))) As n(Note)
 CROSS APPLY dbo.DelimitedSplit8K(n.Note, char(10)) As asl
 WHERE bn.id = 280
   AND asl.Item Like '[0-9][0-9][0-9][0-9][0-9][0-9]%';

This assumes you have the standard CRLF in the data - which can be split on the LF. If that doesn't work because you have CR only then modify the VALUES cross apply and include the replace functions. I would not use a comma here - as that is likely a part of the data. I would use either a ^ or ~ or |, depending on what could be included in the note.

This may not improve performance though - it is going to depend on indexes that are available on the bvc_Notes table and how many rows are returned for id = 280.

jeffw8713 · April 29, 2021, 5:47pm

I should also note - you are converting to nvarchar(max), but by default DelimitedSplit8K works only on varchar(8000).

Unless you have modified the function to work with nvarchar - which would only be nvarchar(4000), then the function will be converting the nvarchar column to varchar and cutting it off.

If you have modified the parameters to use nvarchar(max) - that is also going to cause performance issues.

ScottPletcher · April 29, 2021, 9:45pm

/* NOTE: I changed to varchar(8000) because DelimitedSplit*8K* is 
   *specifically* designed to work on varchar(8000) or less */
;WITH one_note AS (
	SELECT CAST( [Note] AS VARCHAR(8000) ) AS [Note]
	FROM [DB].[dbo].[bvc_Notes] bn
	WHERE 1=1
		AND [id] = 280
	)
SELECT LEFT( [Item], 6 ) AS [AdjustmentID]
FROM one_note 
CROSS APPLY [DB].[dbo].[DelimitedSplit8K](
	REPLACE(REPLACE([Note], CHAR(13), ''), CHAR(10), ',')
	, ',') asl
WHERE 1=1
	AND TRY_CAST( LEFT( [Item], 6 ) AS  INT ) IS NOT NULL

yosiasz · April 30, 2021, 1:20am

or use DelimitedSplit4K

ScottPletcher · April 30, 2021, 1:54am

But since he's pulling simple digits from the string, there's no need to retain nvarchar, even if the orig data is nvarchar.

jeffw8713 · April 30, 2021, 4:41pm

Correct - but limiting the split to either 4000 or 8000 characters may truncate valid data that is expected to be returned. And - if he just changed the input parameter to the function to accept varchar(max) - we all know that performs a lot less efficiently.

It would be ideal if the OP provided sample data...if this data can exceed 4000/8000 characters - and there is data past those limits that needs to be parsed - then using DelimitedSplitNN would not be my recommendation. Using a CLR splitter is probably going to be the 'best' option here - but more information is needed.

ScottPletcher · April 30, 2021, 5:20pm

Yes, min/max/avg len of data and count of values to be parsed here would be extremely helpful.

I personally avoid CLR unless I absolutely have to have it (its memory usage is one reason -- can you say "memory hog"!).

So, if most data were ~24K bytes or less, personally I'd substring it out into 8K bytes and use a standard splitter before I went to CLR.

ScottPletcher · April 30, 2021, 8:13pm

Notice, too, that in the original code he's doing a full SELECT within the call to dbo.DelimitedSplit8K. That isn't needed and will harm performance, so I removed it.