SQLTeam.com | Weblogs | Forums

Query on join conditions


#1

Hello experts, I have some questions on query optimisations. assuming the scenario below:

create table #employee (
emp_id int,
emp_name)

create table #worker (
emp_id int,
emp_code varchar(2),
emp_address varchar(200)
)

create table #manager (
emp_id int,
emp_code varchar(2),
emp_address varchar(200)
)

create table #employment_designation(
emp_dsgn_id int,
emp_dsgn_code varchar(2),
emp_dsgn_value varchar(20)
)

all employees will have a record in the #employee table. depending on their designation, they will have a record in either #manager or #worker table.

My situation now is that I need to obtain the employee designation descriptions for all employees in the employee table.

I intend to perform the joins as below:

SELECT e.name, ed.emp_dsgn_value
FROM #employee e
LEFT JOIN #manager m
ON e.emp_id = m.emp_id
LEFT JOIN #worker w
ON e.emp_id = w.emp_id
LEFT JOIN #employment_designation ed
ON ISNULL(m.emp_code, w.emp_code) = ed.emp_dsgn_code

for simplicity sake, all emp_id and emp_dsgn_id code are set as primary keys.

My question is, will the ISNULL function cause poorer query plans as compared to joining the employee_designation table 2 times, 1 to get the worker designation and another to get the manager designation?

Thank you for your time and apologies if I sound incoherent (:wink:


#2

Probably.

You could do:

LEFT JOIN #employment_designation AS ed
ON m.emp_code = ed.emp_dsgn_code
OR w.emp_code = ed.emp_dsgn_code

provided that only one of them was ever valid. If not you could simulate the ISNULL with:

LEFT JOIN #employment_designation AS ed
ON m.emp_code = ed.emp_dsgn_code
OR (w.emp_code = ed.emp_dsgn_code AND m.emp_code IS NULL)

and perhaps better still (performance-wise) would be:

LEFT JOIN #employment_designation AS ed1
ON m.emp_code = ed1.emp_dsgn_code
LEFT JOIN #employment_designation AS ed2
ON w.emp_code = ed2.emp_dsgn_code -- Optionally add: AND m.emp_code IS NULL

and then

SELECT e.name,
       [emp_dsgn_value] = ISNULL(ed1.emp_dsgn_value, ed2.emp_dsgn_value)

#3

Joining twice would be the solution I would go for.
I have seen drastic performance boost, using exactly this solution on other engines (DB2 on iSeries), but couldn't recreate the situation on MSSQL. I guess optimizer is better on MSSQL :slight_smile:


#4

thanks @Kristen, @bitsmed, I believe this is the concept of making the query SARG-able?

Can I say that the ISNULL function causes optimizer to:

  1. check for the first column and determine whether is it null
  2. and then use the first column if it is null and the second if otherwise

and therefore the optimal execution plan cannot be determined and thus for each time this query runs the compiler will always have to build a new plan instead of using the previously cached plan?


#5

I have got better performance on several occasions by using a UNION ALL to avoid an "OR" in a JOIN ...


#6

use the first column if it is NOT null and the second otherwise

Instead of ISNULL you can use COALESCE which allows an unlimited number of parameters - the first non-NULL is selected. The data-type rules are different, between the two functions, and personally I find that COALESCE() usually suits me better than ISNULL()

I doubt that will be the case - mostly likely the Query Plan will be cached. The problem for non-SARGable queries is that the use of, in this case, a Function means that a simple Index Lookup is not possible, so the Optimiser doesn't make a Plan that includes an index - so a Table Scan is used instead and all the rows in the table are tested to see if they match the Function. This is inherently slower than if an index can be used

Of course if there is no suitable index, on the relevant column(s), then it probably makes no difference - although of course SQL has to process the function, on every row, to check if the row matches, or not, so avoiding a function can be beneficial for that reason.

I suspect that the main reason this is important, and catches people out, is that it is not scalable. If you test the APP with 1,000 rows, or 10,000 ... or probably even 100,000 ... the performance will be fine, and no noticeable difference between SARGable, with perfect indexes, and not. But once the database grows larger and/or there are a significant number of concurrently connected users, everything slows down.


#7

apologies that the hand is not typing what the brain is thinking

Thanks for all the help! I'll be doing double joins to get the description instead of using the ISNULL since that is the recommended method :smile:


#8

:toast:

You might want to try both and see what the difference is? Without a lot of data it is hard to use a Timer effectively, but you can compare the Scan Count and Logical Reads for each variation using this:

SET STATISTICS IO ON; SET STATISTICS TIME ON

--  ... put query here ...

SET STATISTICS IO OFF; SET STATISTICS TIME OFF
GO

Generally if you make a change which reduces the scan count, and the logical I/O, then that is "better". Also worth checking the Query Plan to see which indexes are being used - particularly if you expect an index to be used and actually find that SQL is doign a Table Scan (e.g. using the Clustered Index)

SET SHOWPLAN_TEXT ON
GO

-- ... put query here ...

SET SHOWPLAN_TEXT OFF
GO

#9

Yes, especially when a seek would have been used otherwise. The ISNULL will force a table scan of ed no matter how few rows are being looked up.

The simple rule is never use ISNULL() or another function (CONVERT, SUBSTRING, etc.) in a WHERE or JOIN clause.

SELECT e.name, ed.emp_dsgn_value 
FROM #employee e 
LEFT JOIN #manager m ON e.emp_id = m.emp_id
LEFT JOIN #worker w ON e.emp_id = w.emp_id
LEFT JOIN #employment_designation ed_mgr ON ed_mgr.emp_dsgn_code = m.emp_code
LEFT JOIN #employment_designation ed_wrk ON ed_mgr.emp_dsgn_code IS NULL AND
    ed_wrk.emp_dsgn_code = w.emp_code

#10

Thank you so much!

No wonder something felt off when I came up with that condition...

Another lifelong lesson learnt :slight_smile: