I have a pretty basic query that looks like this. It splits a cell-based on carriage return & collects the first 6 characters of a row if those characters are integers.
WITH one_note AS (
SELECT CAST( [Note] AS NVARCHAR(MAX)) AS [Note]
FROM [DB].[dbo].[bvc_Notes] bn
WHERE 1=1
AND [id] = 280
)
SELECT LEFT( [Item], 6 ) AS [AdjustmentID]
FROM [DB].[dbo].[DelimitedSplit8K](
REPLACE(REPLACE((SELECT TOP 1 [Note] FROM one_note), CHAR(13), ''), CHAR(10), ',')
,',') asl
WHERE 1=1
AND TRY_CAST( LEFT( [Item], 6 ) AS INT ) IS NOT NULL
It relies on the DelimitedSplit8K from here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function
The data that it is splitting in the one field looks like this: https://postimg.cc/vDzCwg1h
The problem is that - at just 500 rows - it takes ~3 minutes to receive the query result. I get that this is unstructured data & that it can never be as efficient as structured data, but can it be made faster than it currently is? I'm using MSSQL 2012 & do not have access to newer stringsplit functions
Without sample data and expected results - the best we can do is guess. What is the data type of column Note in the table bvc_Notes?
Here is a sample:
Declare @testTable Table (Note nvarchar(max));
Insert Into @testTable (Note)
Select concat('123456 This is first paragraph', char(13)+char(10), 'Second Paragraph', char(13)+char(10), '222222 Third');
Select *
From @testTable tt
Cross Apply dbo.DelimitedSplit8K(tt.Note, char(10)) dsk
Where dsk.Item Like '[0-9][0-9][0-9][0-9][0-9][0-9]%';
Using that sample - we should be able to modify yours to this:
SELECT AdjustmentID = left(asl.Item, 6)
FROM db.dbo.bvc_notes bn
CROSS APPLY (VALUES (CAST(bn.Note AS nvarchar(max)))) As n(Note)
CROSS APPLY dbo.DelimitedSplit8K(n.Note, char(10)) As asl
WHERE bn.id = 280
AND asl.Item Like '[0-9][0-9][0-9][0-9][0-9][0-9]%';
This assumes you have the standard CRLF in the data - which can be split on the LF. If that doesn't work because you have CR only then modify the VALUES cross apply and include the replace functions. I would not use a comma here - as that is likely a part of the data. I would use either a ^ or ~ or |, depending on what could be included in the note.
This may not improve performance though - it is going to depend on indexes that are available on the bvc_Notes table and how many rows are returned for id = 280.
I should also note - you are converting to nvarchar(max), but by default DelimitedSplit8K works only on varchar(8000).
Unless you have modified the function to work with nvarchar - which would only be nvarchar(4000), then the function will be converting the nvarchar column to varchar and cutting it off.
If you have modified the parameters to use nvarchar(max) - that is also going to cause performance issues.
/* NOTE: I changed to varchar(8000) because DelimitedSplit*8K* is
*specifically* designed to work on varchar(8000) or less */
;WITH one_note AS (
SELECT CAST( [Note] AS VARCHAR(8000) ) AS [Note]
FROM [DB].[dbo].[bvc_Notes] bn
WHERE 1=1
AND [id] = 280
)
SELECT LEFT( [Item], 6 ) AS [AdjustmentID]
FROM one_note
CROSS APPLY [DB].[dbo].[DelimitedSplit8K](
REPLACE(REPLACE([Note], CHAR(13), ''), CHAR(10), ',')
, ',') asl
WHERE 1=1
AND TRY_CAST( LEFT( [Item], 6 ) AS INT ) IS NOT NULL
2 Likes
But since he's pulling simple digits from the string, there's no need to retain nvarchar, even if the orig data is nvarchar.
Correct - but limiting the split to either 4000 or 8000 characters may truncate valid data that is expected to be returned. And - if he just changed the input parameter to the function to accept varchar(max) - we all know that performs a lot less efficiently.
It would be ideal if the OP provided sample data...if this data can exceed 4000/8000 characters - and there is data past those limits that needs to be parsed - then using DelimitedSplitNN would not be my recommendation. Using a CLR splitter is probably going to be the 'best' option here - but more information is needed.
Yes, min/max/avg len of data and count of values to be parsed here would be extremely helpful.
I personally avoid CLR unless I absolutely have to have it (its memory usage is one reason -- can you say "memory hog"!).
So, if most data were ~24K bytes or less, personally I'd substring it out into 8K bytes and use a standard splitter before I went to CLR.
Notice, too, that in the original code he's doing a full SELECT within the call to dbo.DelimitedSplit8K. That isn't needed and will harm performance, so I removed it.
2 Likes