I have a query that I'm doing some tuning on and I'm getting some odd execution plan/speed results:
Same exact query except they have a slightly different join qualifier.
CASE A:
--Using a date field (D.Date1) as the data filter
SELECT D.ID, D.Identifier1, D.Date1, D.GroupID, D.SubgroupID
FROM
Table1 D INNER JOIN
Table2 S ON D.N = S.N and D.SubgroupID = S.SubgroupID and D.Date1 BETWEEN S.Date1 and S.Date2 AND D.Date1 >= '20200101' LEFT OUTER JOIN
Table3 H ON H.SubgroupID = D.SubgroupID and H.RuleName = 'Ignore'
WHERE H.ID IS NULL
CASE B:
--Using the primary key of Table1 (D.ID) as the data filter
SELECT D.ID, D.Identifier1, D.Date1, D.GroupID, D.SubgroupID
FROM
Table1 D INNER JOIN
Table2 S ON D.N = S.N and D.SubgroupID = S.SubgroupID and D.Date1 BETWEEN S.Date1 and S.Date2 AND D.ID >= 123456 LEFT OUTER JOIN
Table3 H ON H.SubgroupID = D.SubgroupID and H.RuleName = 'Ignore'
WHERE H.ID IS NULL
When looking at the estimated execution plans, CASE A has a missing index that has a 78% impact, and it has a Clustered Index Scan on Table 1 with an 83% cost. CASE B has no missing indexes, and the Table 1 has changed to a Clustered Index Seek, and the bulk of the query cost has shifted to the same index seek on Table2 that is in both queries.
So I would assume that CASE B would be faster, but it takes 2+ hours to run. CASE A returns 10k rows in a second.
I know it's impossible to tell without detailed review of data and indexes and statistics and all sorts of other things, but my query A with the green suggested index and index scan is performing a thousand times better than query B using the primary key field and an index seek.
Seems the difference between the 2 queries is D.Date1 >= '20200101' vs D.ID >= 123456. But as you mentioned, without anything else, it's hard to tell. Can you provide the execution plans?
Yup, I was messing around with the filtering criteria to limit the results of the query, so I added Date1 and ID as two potential filters. It just seems odd to me that an index scan is faster than an index seek (in this case)...
We have a lot of proprietary names in the database so it would be a bunch of work to obfuscate tables and index names.
But I just noticed that Query A (the one missing the index) has a bunch of Parallelism modules, maybe Query A is using all the CPU's while B is just sticking with 1...
Case A can filter rows based on Table2 first, since date1 is a condition for both tables. Case B must determine which rows to process using Table1 only. Look at the query plans to verify that the first query is reading Table2 first and the second query is reading Table1 first.
Any more details would depend on having the DDL for both tables and the estimated query plan.
I'm virtually certain that Table1 should permanently be clustered first on date1 (*). Perhaps the same for table2, although it's also very possible that the subgroupid should be the first key there.
(*) The "rule" about "always clustering on identity" is the single worst performance myth in db design. It's 100% untrue. The clustering index is the single most important factor for best overall performance of the table, and thus should be very carefully chosen and never just defaulted to. Hint: for the big majority of tables, clustering on identity is NOT best for performance.
Yeah we have a lot of cleanup to do, Table1 has 21 non-clustered indexes, and a single clustered index on the identity column. Pretty sure that for the last few years the query tuning plan has been to just create the suggested index from the missing index plan and roll.
How do I know which table is being read first, is it right to left following the arrows? In that case, query A is reading both 1 & 2 at the same time and then a parallelism into an inner join, while query B is starting with an Index seek on Table 2 and then joining into Table 1 using the primary key index (which isn't great as you mentioned and makes sense as I think about it).
Generally right-to-left, but there are still complexities there. I'd have to see the execution plan.
I don't believe this site allows anything other than pictures to be posted, so it might be hard to post the plan. Maybe you can post the xml version of the plan as plain text into the window?!