Replace code

djj55 · March 17, 2016, 1:31pm

Hello, Looking at a procedure that was blocking I found a join with on (isnull(target.MatterNum,'') + isnull(cast(target.RefNum as varchar),'') = -- note joining on a isnull(source.MatterNum,'') + isnull(cast(source.RefNum as varchar),'')) -- pseudo composite key
My question is can this code be replaced with (are these equal)?ON (target.MatterNum = source.MatterNum OR (target.MatterNum IS NULL AND source.MatterNum IS NULL)) AND (target.RefNum = source.RefNum OR (target.RefNum IS NULL AND source.RefNum IS NULL))

Note that MatterNum is VARCHAR(10) and RefNum is INT. The join is within a MERGE.

I assume the first is non-sargable but the second might be. I also noticed the varchar without size.

Thank you,
djj

[edit] to fix typo[/edit]

Kristen · March 17, 2016, 2:02pm

Only risk, as I see it, is that MatterNum and RefNum, on two separate rows, are

123   45
 12  345

currently these would match. I expect whoever set up the code originally did not consider that, and if it happens in practice it will be a bug - in which case a very good reason for never coding a join as a Concatenation like that - apart from the dreadful performance.

Also RefNum is VARCHAR, so hopefully never has leading digits, in which case the problem will not exist. I suppose if you are concerned about it you could check if there is any data that satisfies that edge condition.

All that said, I'm note sure with your OR and AND that you will wind up with anything SARGable ... here's hoping though!

I have had conditions where I had an OR that I used a UNION instead of an OR in the JOIN which improved performance. But with your pair of ORs and an AND I don't think there is anything obvious like that (and my guess is that 4x UNIONs is unlikely to be faster).

Might be worth looking at the data to see if any of the combination NEVER exist (and obviously satisfy yourself that they could never exist).

djj55 · March 17, 2016, 2:28pm

Thank you for the reply.

The way the table works, there should never be a record with both MatterNum and RefNum.
Over my objection (I cave way too easily) they made RefNum an INT and made it start at one billion.

The sargable question is why I stated it as "might be". But with the current there is no index used.

Thanks again

Kristen · March 17, 2016, 2:51pm

That makes for an accident-waiting-to-happen ... lucky that MatterNum is defined so short.

Kristen · March 17, 2016, 2:53pm

In that case would this be enough?

target.MatterNum = source.MatterNum OR target.RefNum = source.RefNum

Might be worth looking at having Filtered Indexes which exclude the NULLs (e.g. index on MatterNum filtering WHERE MatterNum IS NOT NULL)

You might have to duplicate the Index Filter Condition in the JOIN (not sure if the Optimiser will decide to use the index otherwise)

Kristen · March 17, 2016, 2:55pm

Sorry, another idea. Computed column on COALESCE(MatterNum, RefNum) (you'll need some type conversion) and then index (and JOIN) on that?

Some risk that a MatterNum in one table will match a RefNum in the other table. You could prefix a one-char "M" / "R" in the computed column to avoid that perhaps?

Kristen · March 17, 2016, 2:57pm

If there aren't any exceptions, currently, maybe put a Constraint / Trigger / Somesuch in place to prevent any in future - so you can safely JOIN in the knowledge that there won't be any)

ScottPletcher · March 17, 2016, 4:22pm

If you need NULLs to match, then your original code looks correct to me. It should indeed be much better able to use index(es). The simple rule is never use ISNULL() in a WHERE or JOIN clause (there are some very obscure cases where it might be useful, but they are so rare that you can ignore them).

djj55 · March 17, 2016, 4:37pm

Thank you both.

I realize the ISNULL is not a good thing, thus the question as I am trying to get rid of them.

Kristen · March 17, 2016, 4:58pm

I'd be curious to know what sort of performance difference you get. e.g. using this with your Old / New syntax (and before / after adding any indexes etc.)

SET STATISTICS IO ON; SET STATISTICS TIME ON

... your query here ...

SET STATISTICS IO OFF; SET STATISTICS TIME OFF;
GO

Only things of interest (I think?) would be the table(s) and their Scan Count & Logical Reads listed immediately after the query resultset, and the Execution Time immediately after that (and run for a decent size workload)

djj55 · March 17, 2016, 5:33pm

@Kristen, here are the results. No indexes changed or added. Tested on actual data (copied to test machine).

Test 1: ISNULL
Table '#tmpCurrentContactInfo'. Scan count 3, logical reads 27442, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DMG_Changes'. Scan count 3, logical reads 154784, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2391 ms,  elapsed time = 16361 ms.

(49230 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Test 2: No ISNULL
Table '#tmpCurrentContactInfo'. Scan count 3, logical reads 27442, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DMG_Changes'. Scan count 3, logical reads 154784, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 2109 ms,  elapsed time = 1558 ms.

(49230 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

Test 3: no null to null check - ON (target.MatterNum = source.MatterNum OR target.RefNum = source.RefNum)
I declare a failure! Taking much too long so I do not know if it actually works.

ScottPletcher · March 17, 2016, 5:38pm

The single most critical factor for performance is the best clustering index on every table. So you'd need to review the indexes on DMG_Changes and on #tmpCurrentContactInfo -- yes, temp table should be properly clustered as well, and, for efficiency, the clus index should be defined before the table is loaded.

Kristen · March 17, 2016, 6:36pm

From the Scan Count and Logical Reads it looks like it hasn't caused any change in Query Plan. Timing is a bit quicker, which may just be the saving of not having to do the Functions & Concatenation.

Gonna need some suitable indexes to improve things (although even if the indexes had been there before they probably wouldn't have been used as the functions made it non-SARGable of course).

I've been know to do:

SELECT Col1, Col2, ...
INTO #temp 
FROM MyTable
WHERE 1=0

to get the table created, a) without having to write a CREATE TABLE statement and b) being sure that the datatypes / sizes of columns etc. would be honoured (even if they change in the future). Nullability and IDENTITY can be a hazard with that approach though ...

ScottPletcher · March 17, 2016, 7:19pm

Nullability could be an issue, but not IDENTITY. You can add/remove an identity column as needed.

Kristen · March 18, 2016, 7:35am

Never knew that (nor thought to try!!), thanks. I have always resorted to

SELECT CONVERT(int, MyIdentity) as [MyIdentity],
       Col1, Col2, ...
INTO #temp 
FROM MyTable
WHERE 1=0

Done that too for columns that I want to allow NULLs

ScottPletcher · March 18, 2016, 2:29pm

SELECT TOP (0) *
INTO #temp
FROM MyTable
UNION
SELECT TOP (0) *
FROM MyTable

The UNION kills the identity.