Performance problems - seperation of queries?

ahmeda86 · December 28, 2015, 6:02pm

So, I have this big query that has big performance issues. It takes around 15-30 minutes for the results to come back - obviously not good.

I am struggling to figure out how best to optimise it but also to maybe break the query in manageable chunks (I guess this is something for later to look through).

I would love for any advice on making this improve performance. I really don't know which direction to go in.

With
FirstHoursWheelPosition AS
(SELECT Id, TreadId, ActivityWheelPositionId, HourMeterReading, ROW_NUMBER() OVER (PARTITION BY activityWheelPositionId
ORDER BY activitydatetime, recordcreateddatetime) AS rnfirst
FROM activity
WHERE activitywheelpositionid IS NOT NULL AND HourMeterReading IS NOT NULL),

LastHoursWheelPosition AS
(SELECT Id, TreadId, ActivityWheelPositionId, HourMeterReading, ROW_NUMBER() OVER (PARTITION BY activityWheelPositionId
ORDER BY activitydatetime DESC, recordcreateddatetime DESC) AS rnlast
FROM activity
WHERE activitywheelpositionid IS NOT NULL AND HourMeterReading IS NOT NULL),

FirstDistanceWheelPosition AS
(SELECT Id, TreadId, ActivityWheelPositionId, DistanceMeterInMIcrons, ROW_NUMBER() OVER (PARTITION BY activityWheelPositionId
ORDER BY activitydatetime, recordcreateddatetime) AS rnfirst
FROM activity
WHERE activitywheelpositionid IS NOT NULL AND DistanceMeterInMicrons IS NOT NULL),

LastDistanceWheelPosition AS
(SELECT Id, TreadId, ActivityWheelPositionId, DistanceMeterInMicrons, ROW_NUMBER() OVER (PARTITION BY activityWheelPositionId
ORDER BY activitydatetime DESC, recordcreateddatetime DESC) AS rnlast
FROM activity
WHERE activitywheelpositionid IS NOT NULL AND DistanceMeterInMicrons IS NOT NULL),

InitialTreadDepth AS
(SELECT Id, TreadId, RemainingTreadDepthInnerInMicrons, RemainingTreadDepthCenterInMicrons, RemainingTreadDepthOuterInMicrons, ROW_NUMBER()
OVER (PARTITION BY TreadId
ORDER BY activitydatetime, recordcreateddatetime) AS rnfirst
FROM activity
WHERE ((RemainingTreadDepthInnerInMicrons IS NOT NULL) OR
(RemainingTreadDepthCenterInMicrons IS NOT NULL) OR
(RemainingTreadDepthOuterInMicrons IS NOT NULL))),

LastTreadDepth AS
(SELECT Id, TreadId, RemainingTreadDepthInnerInMicrons, RemainingTreadDepthCenterInMicrons, RemainingTreadDepthOuterInMicrons, ROW_NUMBER()
OVER (PARTITION BY TreadId
ORDER BY activitydatetime DESC, recordcreateddatetime DESC) AS rnlast
FROM activity
WHERE ((RemainingTreadDepthInnerInMicrons IS NOT NULL) OR
(RemainingTreadDepthCenterInMicrons IS NOT NULL) OR
(RemainingTreadDepthOuterInMicrons IS NOT NULL))),

InitialPurchase As
(SELECT Activity.Id, Activity.TreadId,Activity.ActivityDateTime,ROW_NUMBER() OVER (PARTITION BY TreadId
ORDER BY activitydatetime , recordcreateddatetime ) AS rnfirst
FROM activity
join activitytype on activity.activitytypeid = activitytype.id and activitytype.Description = 'Purchase' ),

FirstMount As
(SELECT Id, TreadId, ActivityWheelPositionId, ActivityDateTime, ROW_NUMBER() OVER (PARTITION BY TreadId
ORDER BY activitydatetime, recordcreateddatetime) AS rnfirst
FROM activity
WHERE activitywheelpositionid IS NOT NULL ),

CurrentStatus As
(SELECT Activity.Id, TreadId,SiteHierarchyId, StatusId,Status.Description, case when Status.Description = 'In-Service' then ActivityWheelPositionId else null end as ActivityWheelPositionId,ActivityDateTime,Activity.RimId, ROW_NUMBER() OVER (PARTITION BY TreadId
ORDER BY activitydatetime desc, recordcreateddatetime desc) AS rnlast
FROM activity left join status on activity.statusid = status.id ),

LastDamage As
(SELECT Activity.Id, Activity.TreadId,Activity.ActivityDateTime,ActivityTireDamage.TireDamageId,ActivityTireDamage.TireDamageCategoryId,ActivityTireDamage.Comment, ROW_NUMBER() OVER (PARTITION BY TreadId
ORDER BY activitydatetime desc, recordcreateddatetime desc) AS rnlast
FROM activitytiredamage
join activity on activitytiredamage.activityid = activity.id ),

TireRepair As
(Select Activity.TreadId,max(ActivityTireRepair.CausedRearOnly + 0) as RepairLevel from Activity Join ActivityTireRepair on Activity.Id = ActivityTireRepair.ActivityId group by Activity.TreadId),

TotalCost As
(Select Activity.TreadId,Sum(ActivityTireCost.Amount) as TotalCost from Activity join ActivityTireCost on Activity.Id = ActivityTireCost.ActivityId group by Activity.TreadId)
SELECT        
Tread.Id AS TreadId, Sum(LastHoursWheelPosition.HourMeterReading - FirstHoursWheelPosition.HourMeterReading) + COALESCE (Tread.UntrackedHours, 0) AS TreadTotalHours,

Sum(LastDistanceWheelPosition.DistanceMeterInMicrons - FirstDistanceWheelPosition.DistanceMeterInMicrons)
+ COALESCE (Tread.UntrackedDistanceInMicrons, 0) AS TreadTotalDistanceInMicrons,

LastTreadDepth.RemainingTreadDepthInnerInMicrons,
LastTreadDepth.RemainingTreadDepthCenterInMicrons,
LastTreadDepth.RemainingTreadDepthOuterInMicrons ,
TotalCost.TotalCost
,InitialPurchase.ActivityDateTime as InitialPurchaseDate
,FirstMount.ActivityDateTime as FirstMountDate
,CurrentStatus.SiteHierarchyId
,SiteHierarchy.Name as LocationName
,CurrentStatus.Description as CurrentStatus
,ActivityWheelPosition.VehicleId
,Vehicle.VehicleNumber
,WheelBasePositionNaming.Name as WheelPosition
,case when not WheelPositionOrder.Description = 'Across the Axles Left to Right Starting at Left Front' then WheelBasePosition.DisplayOrderAround else WheelBasePosition.DisplayOrderAcross end as WheelPositionDisplayOrder
,LastHoursWheelPosition.HourMeterReading
,LastDistanceWheelPosition.DistanceMeterInMicrons
,TireDamage.DamageCode
,TireDamage.Description as DamageDescription
,TireDamageLocation.Location as DamageLocation

,case when TireDamageCategoryInstance.Id is null and TireDamageCategoryAlternate.Id is null then TireDamageCategoryMain.Description else TireDamageCategoryInstance.Description end as DamageCategory

,LastDamage.Comment as DamageComment
,TireRepair.RepairLevel
,Tread.TreadNumber
,Tread.Used
,Tread.PurchasedUsed
,Tread.TrackHours
,Tread.TrackDistance
,Tread.Active
,Tread.TestTire
,Tread.OriginalEquipment
,Tread.Comment as TreadComment
,Tread.Invalid
,Tread.InvalidDescription
,Tread.Circumference
,TDesc.SerialNumber
,TDesc.BrandNumber
,TDesc.ManufacturerAbbreviation
,TDesc.ManufacturerName
,TDesc.Pattern
,TDesc.PatternAbbreviation
,TDesc.TireSize
,TDesc.TRADisplay
,TDesc.PlyOrPlyRating
,TDesc.CompoundName
,TDesc.CompoundSpec2
,TDesc.Article
,TDesc.OriginalTreadDepthInMicrons
,CurrentStatus.RimId
,rtrim(
case when TDesc.ManufacturerAbbreviation is null or len(TDesc.ManufacturerAbbreviation) = 0 then '' else TDesc.ManufacturerAbbreviation + ' ' end +
case when TDesc.PatternAbbreviation is null or len(TDesc.PatternAbbreviation) = 0 then '' else TDesc.PatternAbbreviation + ' ' end +
case when TDesc.TireSize is null or len(TDesc.TireSize) = 0 then '' else TDesc.TireSize + ' ' end +
case when TDesc.PlyOrPlyRating is null or len(ltrim(rtrim(TDesc.PlyOrPlyRating))) = 0 then '' else ltrim(rtrim(TDesc.PlyOrPlyRating)) + ' ' end +
case when TDesc.TRADisplay is null or len(TDesc.TRADisplay) = 0 then '' else TDesc.TRADisplay + ' ' end +
case when TDesc.CompoundName is null or len(ltrim(rtrim(TDesc.CompoundName))) = 0 or TDesc.CompoundName = 'NA' or TDesc.CompoundName = 'N/A' then '' else ltrim(rtrim(TDesc.CompoundName)) + ' ' end +
case when TDesc.CompoundSpec2 is null or len(ltrim(rtrim(TDesc.CompoundSpec2))) = 0 or TDesc.CompoundSpec2 = 'NA' or TDesc.CompoundSpec2 = 'N/A' then '' else ltrim(rtrim(TDesc.CompoundSpec2)) + ' ' end +
case when Tread.TreadNumber is null or Tread.TreadNumber = 0 then '' else 'Retread ' + cast(Tread.TreadNumber as nvarchar) End
)
as CompactTireDescription
						 -----
FROM Tread
Left Outer Join vwTreadDescription TDesc on Tread.Id = TDesc.TreadId
LEFT OUTER JOIN LastHoursWheelPosition ON Tread.Id = LastHoursWheelPosition.TreadId
LEFT OUTER JOIN FirstHoursWheelPosition ON LastHoursWheelPosition.ActivityWheelPositionId = FirstHoursWheelPosition.ActivityWheelPositionId AND FirstHoursWheelPosition.rnfirst = 1
LEFT OUTER JOIN LastDistanceWheelPosition ON Tread.Id = LastDistanceWheelPosition.TreadId
LEFT OUTER JOIN FirstDistanceWheelPosition ON LastDistanceWheelPosition.ActivityWheelPositionId = FirstDistanceWheelPosition.ActivityWheelPositionId AND FirstDistanceWheelPosition.rnfirst = 1
LEFT OUTER JOIN InitialTreadDepth ON Tread.Id = InitialTreadDepth.TreadId and InitialTreadDepth.rnfirst = 1
LEFT OUTER JOIN LastTreadDepth ON Tread.Id = LastTreadDepth.TreadId and LastTreadDepth.rnlast = 1
Left Outer Join InitialPurchase on Tread.Id = InitialPurchase.TreadId and InitialPurchase.rnfirst = 1
Left Outer Join FirstMount on Tread.Id = FirstMount.TreadId and FirstMount.rnfirst = 1
Left Outer Join CurrentStatus on Tread.Id = CurrentStatus.TreadId and CurrentStatus.rnlast = 1
Left Outer Join SiteHierarchy on CurrentStatus.SiteHierarchyId = SiteHierarchy.Id
left outer join SiteHierarchyPreference on SiteHierarchy.Id = SiteHierarchyPreference.SiteHierarchyId
Left Outer Join ActivityWheelPosition on CurrentStatus.ActivityWheelPositionId = ActivityWheelPosition.Id
Left outer join Vehicle on ActivityWheelPosition.VehicleId = Vehicle.Id
Left outer join WheelBasePosition on ActivityWheelPosition.WheelBasePositionId = WheelBasePosition.Id
Left Outer Join WheelBasePositionNaming on WheelBasePosition.Id = WheelBasePositionNaming.WheelBasePositionId and SiteHierarchyPreference.WheelPositionNamingId = WheelBasePositionNaming.WheelPositionNamingId
Left outer Join WheelPositionOrder on SiteHierarchyPreference.WheelPositionOrderId = WheelPositionOrder.Id
Left Outer Join LastDamage on Tread.Id = LastDamage.TreadId and LastDamage.rnlast = 1
Left OUter Join TireDamage on LastDamage.TireDamageId = TireDamage.Id
Left outer join TireDamageLocation on TireDamage.TireDamageLocationId = TireDamageLocation.Id
Left outer join TireDamageCategory TireDamageCategoryInstance on LastDamage.TireDamageCategoryId = TireDamageCategoryInstance.Id
Left outer join TireDamageCategory TireDamageCategoryMain on TireDamage.TireDamageCategoryId = TireDamageCategoryMain.Id
Left outer join TireDamageCategory TireDamageCategoryAlternate on TireDamage.AlternateTireDamageCategoryId = TireDamageCategoryAlternate.Id
Left Outer Join TireRepair on Tread.Id = TireRepair.TreadId
Left outer join TotalCost on Tread.Id = TotalCost.TreadId

WHERE
(LastHoursWheelPosition.rnlast IS NULL OR LastHoursWheelPosition.rnlast = 1)
AND (LastDistanceWheelPosition.rnlast IS NULL OR LastDistanceWheelPosition.rnlast = 1)
AND (LastTreadDepth.rnlast IS NULL OR LastTreadDepth.rnlast = 1)

AND (InitialTreadDepth.rnfirst IS NULL or InitialTreadDepth.rnfirst = 1)
and (InitialPurchase.rnfirst is null or InitialPurchase.rnfirst = 1)
and (FirstMount.rnfirst is null or FirstMount.rnfirst = 1)
and (CurrentStatus.rnlast is null or CurrentStatus.rnlast = 1)
and (LastDamage.rnlast is null or LastDamage.rnlast = 1)

AND (
(LastHoursWheelPosition.Id IS NOT NULL)
OR
(LastDistanceWheelPosition.Id IS NOT NULL)
OR
(LastTreadDepth.Id IS NOT NULL)
OR
(InitialPurchase.Id IS NOT NULL)
OR
(FirstMount.Id IS NOT NULL)
OR
(CurrentStatus.Id IS NOT NULL)
OR
(LastDamage.Id IS NOT NULL)
)
 GROUP BY 
Tread.Id,
InitialTreadDepth.RemainingTreadDepthInnerInMicrons,
InitialTreadDepth.RemainingTreadDepthCenterInMicrons,
InitialTreadDepth.RemainingTreadDepthOuterInMicrons,
LastTreadDepth.RemainingTreadDepthInnerInMicrons,
LastTreadDepth.RemainingTreadDepthCenterInMicrons,
LastTreadDepth.RemainingTreadDepthOuterInMicrons,
Tread.UntrackedHours,
Tread.UntrackedDistanceInMicrons
,TotalCost.TotalCost
,InitialPurchase.ActivityDateTime
,FirstMount.ActivityDateTime
,CurrentStatus.SiteHierarchyId
,SiteHierarchy.Name
,CurrentStatus.Description
,ActivityWheelPosition.VehicleId
,Vehicle.VehicleNumber
,WheelBasePositionNaming.Name
,case when not WheelPositionOrder.Description = 'Across the Axles Left to Right Starting at Left Front' then WheelBasePosition.DisplayOrderAround else WheelBasePosition.DisplayOrderAcross end
,LastHoursWheelPosition.HourMeterReading
,LastDistanceWheelPosition.DistanceMeterInMicrons
,TireDamage.DamageCode
,TireDamage.Description
,TireDamageLocation.Location
,case when TireDamageCategoryInstance.Id is null and TireDamageCategoryAlternate.Id is null then TireDamageCategoryMain.Description else TireDamageCategoryInstance.Description end
,LastDamage.Comment
,TireRepair.RepairLevel
,Tread.TreadNumber
,Tread.Used
,Tread.PurchasedUsed
,Tread.TrackHours
,Tread.TrackDistance
,Tread.Active
,Tread.TestTire
,Tread.OriginalEquipment
,Tread.Comment
,Tread.Invalid
,Tread.InvalidDescription
,Tread.Circumference
,TDesc.SerialNumber
,TDesc.BrandNumber
,TDesc.ManufacturerAbbreviation
,TDesc.ManufacturerName
,TDesc.Pattern
,TDesc.PatternAbbreviation
,TDesc.TireSize
,TDesc.TRADisplay
,TDesc.PlyOrPlyRating
,TDesc.CompoundName
,TDesc.CompoundSpec2
,TDesc.Article
,TDesc.OriginalTreadDepthInMicrons
,CurrentStatus.RimId

GO

This is the client statistics in SSMS:
Client stats

Exec plan

I am running this on Azure SQL Server.
Now, even if I run it locally, the query takes around 7-10 minutes to come back. It's still too long so at this point, it doesn't matter where the DB is hosted - the query is problematic.

joenak · December 28, 2015, 9:45pm

Did you add the missing index yet?

/*
Missing Index Details from TireMasterViewQuery Exec Plan.sqlplan
The Query Processor estimates that implementing the following index could improve the query cost by 15.6491%.
*/

/*
USE [TreadStatDev]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Tread] ([CasingId])
INCLUDE ([Id],[TireSpecId],[TireSpecForReviewId],[OriginalTreadDepthInMicrons])
GO
*/

jcelko · December 28, 2015, 9:51pm

Please follow basic Netiquette and post the DDL we need to answer this. Follow industry and ANSI/ISO standards in your data. You should follow ISO-11179 rules for naming data elements. You should follow ISO-8601 rules for displaying temporal data (https://xkcd.com/1179/). We need to know the data types, keys and constraints on the table. Avoid dialect in favor of ANSI/ISO Standard SQL.

Based on a few* decades* of SQL, however, I see some bad code smells. I think the poor performance is due to poor schema design.

your "invalid" and "active" look like assembly language flags,
"Current_status" is wrong; it has to be "current_something_in_particular_status" Marriage? Employment?
We do not put meta data like "..In_Microns" on a data element name
"vehicle_id" and "vehicle_number" is usually a VIN, not two conflicting names.
What is the "site_hierarchy" table like?
Current_status is not an entity and should not have a separate table.
"tire damage" is not an entity; the "tire_damage_code" is an attribute of a tire, isn't it? I would use CHECK (tire_damage_code IN (..)) constraint instead.
Etc.

You also have more OUTER JOINs in* one query than I have in entire applications*. If you have a good schema, then DRI means that PK-FK constraints are in place. But a bad schema lacks data integrity. It looks like you have confused attributes, values and entities in this thing.

Post a skeleton of the schema, some spec and perhaps we can help you.

ahmeda86 · December 28, 2015, 10:05pm

@jcelko - thanks for your reply. I appreciate it. I didn't design the schema so naming conventions and such are out of my scope since I am laid to look at this particular issue of performance. I'll try to post the schema shortly but can tell you that the integrity of data is fine (from source)

@joenak - indeed I did however it does not improve much of the performance, maybe ever so slightly but nothing substantial.

jcelko · December 28, 2015, 10:20pm

Welcome to the world IT! 85-95% of the problems in databases

jcelko · December 28, 2015, 10:24pm

Welcome to the world of IT! Unfortunately, 85 – 95% of the problems in a schema are in the DDL, not the DML. There are no kludges that can overcome a bad schema. The bad news is, you will be blamed. If you cannot do the impossible. The good news is that an outside consultant (me) will come in scream and yell, tell your boss everything you have told him, but because I wear a suit and come for more than 1000 miles away, they will listen to me. I will try to redo the data model, but the company will go bankrupt before I can finish

JeffModen · December 29, 2015, 3:02am

@ahmeda86,

I know this looks to be a daunting if not impossible task... but it's not. It's the old story of "How do you eat an elephant"? The answer is, "One bite at a time". Like Granny used to say, "Peel just one potato at a time".

I took a look at the execution plan and the first thing I noticed was that there appear to be no Clustered Indexes and there appear to be no usable non-Clustered Indexes. Don't get me wrong. The way someone wrote the query is pretty bad but there's just about no chance of it being good without any indexes coming into play. EVERYTHING is a TABLE SCAN in this query.

Now, before you start trying to add any indexes, remember the "Peel one potato at a time" rule. The query needs to be rewritten and indexes need to be added. Pick a section of the query and start rewriting it. Once you get a small section done, then check for indexes and decide 1) What would make the best/most logical Clustered Index (that will probably be on the "ID" column in most of the tables but not always... "Must look eye!") and then the NCI's that can truly make a difference for that section of code.

Once you have a section done and it's fast and it uses few resources, move onto the next logical section and do the same. When you're done with sections, either materialize them in a Temp Table so that you don't have to keep so many balls in the air all at once or join the sections.

So, what could a section be? Look at all the bloody CTEs in this thing and see what's common. Can any of them be combined more simply than what's currently been done?

For example (starting at the top), the first two CTE's (FirstHoursWheelPosition and LastHoursWheelPosition) are nearly identical. Judging by the names of those CTEs and by looking at the queries they contain, my first guess would be that they're later going to try to figure out the difference between the "starting hours" and the "ending hours" to come up with some sort of a duration. If you look further down in the code, they do just exactly that. That's your first potato. I'm thinking that could be more easily solved by subtracting the MIN(HourMeterReading) from the MAX(HourMeterReading) in a single query that uses a GROUP BY on the activity.ID, activity.TreadID, and ActivityWheelPositionId.

Once you have that query down pat, don't index just yet. Look at the other queries. Are there any others that read from the Activity table that could benefit from the aggregation you just formed? If so, add those to the query you just wrote, test the hell out of it, and then work on indexes for the Activity table and any other tables you may have had to bring into play (perhaps some lookup/reference tables). Make sure the criteria for the query is SARGable (SARG = Search ARGument and means that everything could use index SEEKs {possibly followed by a high performance range scan not shown on the execution plan but in the properties of the SEEK} if one were available).

Once you have that bad boy all sewn up, set it aside in a safe place and start working on the next logical section. Wash, rinse, repeat. Concentrate on one potato at a time.

When you're all done, take all the pieces you made and decide whether they should be joined together as code or if it might be more efficient to materialize the output from each section in a Temp Table and join to that/those instead.

I've had great luck with this "Divide'n'Conquer" technique. For example, I took a query that would take 45 minutes to run and blew TempDB out by more than 50GB because of all the internal rows it had to create in the form of hash joins. It would also cripple the server. Nothing else could run at the same time. By moving one logical tidbit to a Temp Table and then joining to that, I knocked the query down to just 3 seconds. It might not be quite that easy with this query but wanted you to know it's certainly possible with this query.

Remember, "Divide'n'Conquer". Peel one potato at a time and make sure each potato is done the best it can be before picking up the next one. Don't be afraid to materialize interim results in a Temp Table. And, if you do it right, the Temp Tables you do create will ONLY have what you need in them, which also means you won't have to worry about indexing them at all.

I'll also say it would be nice to be able to correct all the sins in the DDL but, sometimes, you just have to play with the cards your dealt. Save that for "Rev 2" unless you happen across some really low hanging fruit that's high in ROI.

Good luck on this and post back if you need to bounce some other ideas off someone.

ahmeda86 · December 29, 2015, 5:01pm

Excellent response. Thank you! I really appreciate it and this helps me get motivated. no doubt I will be back with some questions but thanks again, this really is excellent!

ahmeda86 · December 29, 2015, 6:52pm

So, here is something interesting.
Whilst I know I am using a low plan on SQL Azure, as I said that the query takes 15-30 mins to execute.
I just did an export to local (running SQL Server 2014 Enterprise) and running the same query locally does NOT prompt me for missing indexes but also executes the query in under 18 seconds.

such a big difference. Not sure exactly what's going on (ok, even cranking up the DTU's in Azure still takes a while to return the results back).

Time to figure out how best to dissect this query....

JeffModen · December 29, 2015, 7:28pm

"It Depends". How many rows does the query return. You may be running into "The Great Equalizer", otherwise known as "the pipe" and how much time it takes data to transmit and return to the screen.

ahmeda86 · December 29, 2015, 7:32pm

Sure. It seems its just to do with the processing power in Azure applied to the account which is fair enough.
So I am ignoring that and now trying to work on, as you said, taking bits of the query and seeing how it can be improved. one long query to go through!

henrikop · December 30, 2015, 7:20am

@ahmeda86 as @JeffModen said (and I love his replies!) the query will eat a lot of memory, when your Azure database is on a cheap plan, you don't have a lot of memory hence your query will be slow. I fully agree with the "peeling one potato at a time" advice. When building the query on your own (based on the problem you try to solve) you'll end with a better understanding of the data than before and learn lessons for the coming years when on a cheap Azure plan using a temp table is probably the way to go. It makes the code easier to read too. What helps me is to put the query in human language: What are you trying to achieve?

@jcelko nice to see you are still active!. I threw most books away, but yours is still shining behind my back.

Haven't been active in years, but nice to see the community is still alive!

James · January 8, 2016, 5:09pm

We are having a debate about this at work currently. The general opinion is that using temp tables is a sign of bad code. I have seen some plans generated that are horrible with tons of joins. The optimizer just gives up once the complexity reaches a certain level and you get stuck with a plan that smells like something my ex-wife cooked for dinner.

I have seen great performance improvements using the divide and conquer rule (making use of temp tables and more simple SQL) but it all depends on what is happening. Plus, breaking things up makes the chance of getting consistently good statistics more likely - in my opinion from what I have seen.

I used to be very strict in my opposition to things like temp tables and cursors but they do have a place. Of course the latter is very infrequent but it does happen. The key is testing with each piece and then ensuring that your statistics are good consistently over multiple business cycles for obvious reasons.

Always start with a set-based solution in mind and then be open to other options as needed. Absolutes are absolutely wrong all of the time.

JeffModen · January 9, 2016, 2:16am

It's incredibly ironic that people would say that because it's absolutely true... just not the way that they think. I can only imagine that the supposed "best practice" of avoiding Temp Tables gained notoriety from the myth of portability. Use the best tools available in whatever RDBMS it is you're using and deal with portability issues IF they come up. It's just not that bloody difficult to modify code especially if they're in stored procedures that allow abstraction to not have to make changes to the front end code. Not using TempDB is akin to not using the LOG() key on your scientific calculator because some people might only have a 4 function calculator.

For the kinds of monster queries that we've all grown to hate that desperately need the "Divide'n'Conquer" methods, it actually IS the bad code of the monster query that will be making use of Temp Tables in a fashion far worse than anyone that knows how to explicitly use Temp Tables properly. Using them in a directed manner isn't the bad thing. The long winded code I previous spoke of had cause TempDB to grow by 50GB in just the first 15 minutes and was still growing when I stopped it. By using a TempTable with a lot of rows but still weighing in at less than 10MB, the code ran almost instantly and cause no growth nor even anything that even comes close to being qualified as "pressure on TempDB".

Heh... I've run into supposed DBAs that forbid the use of Temp Tables. I told them that they need to shut down the server immediately because all the crap code created by not allowing their explicit use has resulted in more use of TempDB than they could have imagined.

And, just as a reminder... the term "Set Based" patently does NOT mean "All in one query".

Kristen · January 9, 2016, 8:47am

I've always avoided massive GROUP BYs, like this one, on the assumption that they must cause some significant work behind the scenes. I haven't looked closely at the query, but my thought would be to do a first (or "inner") query with the minimum number of CTEs and JOINs to satisfy the WHERE clause, add some GROUP BY's at that stage if they are needed(**) and then either store the PKeys in a #TEMP table or use that as an inner query to JOIN back to the other tables which are needed for the display columns in the SELECT clause.

There is a trade off, in my mind, between just getting a PKey from a table, and then revisiting that table to get more display columns (maybe DO store those display-only columns in #TEMP), and JOINing to a different table to get yet-more columns for Display purposes only, which are not required for WHERE/ORDER/GROUP clause

But my question is ... do you old-hands happy find it efficient to use GROUP BY on huge numbers of columns, or do you avoid that type of code for the reasons/thoughts I've outlined? If you lot have mega-GROUP BY clauses I'd better review how I code!!

(**) I'm not seeing any aggregates in the main select, so my guess is that the GROUP BY is being used as DISTINCT, in which case I would program it out rather than have SQL select multiple, redundant, records to then throw the DUPs away,

JeffModen · January 16, 2016, 9:57pm

@Kristen,

I'm with you. I'll typically avoid a huge number of "grouped by" columns and try to find a more efficient way to do things. It does depend. A lot of times, I'll use pre-aggregation of smaller sets and then join those smaller sets. I've converted a whole lot of 45 minute runs to 3 second runs that way. Works incredibly well for CROSS TABs, as well and will usually run about twice as fast as an equivalent PIVOT.