SQLTeam.com | Weblogs | Forums

Optimizing a query with two distinct counts and a group by

Hi all,

I've been trying to optimize one of my queries to improve performance.

I have made 3 different 'optimization' iterations relative to some suggestions I found on the internet. For each query I have included it's processing CPU time and elapsed time. My original query seems to be the fastest...

I'm not sure the best way to write this query... please can someone help me improve my query for best execution time?

Original query: CPU time = 3060 ms, elapsed time = 1180 ms

SELECT 
	CASE 
		WHEN a.Barcode IS NULL THEN b.Barcode
		ELSE a.Barcode
	END Location
	,a.PalletsCurrent
	,b.PalletsDue
FROM
	(
		SELECT
			l.Barcode
			,COUNT(DISTINCT a.PalletId) PalletsCurrent
		FROM
			Warehouse.WarehouseNotes n
			INNER JOIN Warehouse.WarehouseActions a ON (n.Id = a.WarehouseNoteId)
			INNER JOIN Warehouse.WarehouseLocations l ON (a.RequestedLocationId = l.Id)
		WHERE
			(
				l.Barcode LIKE '%BU%'
				OR
				l.Barcode LIKE '%LB%'
			)
			AND
			a.PalletId IS NOT NULL
			AND
			(
				(
					n.NoteType = 6
					AND
					(
						n.NoteStatus = 1
						OR
						n.NoteStatus = 2
					)
				) --CCN: Note [Waiting] OR [In Progress]
				OR
				(
					n.NoteType = 1
					AND
					(
						n.NoteStatus = 2
						AND
						a.ActionStatus = 3
					)
				) --Picking Stock: Note [In Progress] AND Pallets [Done]
			)
		GROUP BY
			l.Barcode
	) a 
	FULL OUTER JOIN	(
		SELECT
			l.Barcode
			,COUNT(DISTINCT a.PalletId) PalletsDue
		FROM
			Warehouse.WarehouseNotes n
			INNER JOIN Warehouse.WarehouseActions a ON (n.Id = a.WarehouseNoteId)
			INNER JOIN Warehouse.WarehouseLocations l ON (a.RequestedLocationId = l.Id)
		WHERE
			(
				l.Barcode LIKE '%BU%'
				OR
				l.Barcode LIKE '%LB%'
			)
			AND
			a.PalletId IS NOT NULL
			AND
			(
				n.NoteType = 1
				AND
				(
					a.ActionStatus = 1
					OR
					a.ActionStatus = 2
				) --Picking Stock: Pallets [Waiting] OR [In Progress]
			)
		GROUP BY
			l.Barcode
	) b ON (a.Barcode = b.Barcode)
ORDER BY
	1
Location PalletsCurrent PalletsDue
1-BU-1-001 81 28
1-LB-1-001 5 42
2-LB-1-001 91 36
3-LB-1-001 21 20
4-LB-1-001 59 107

1st Iteration: CPU time = 3297 ms, elapsed time = 3335 ms.

SELECT 
	CASE 
		WHEN a.Barcode IS NULL THEN b.Barcode
		ELSE a.Barcode
	END Location,
	a.PalletsCurrent,
	b.PalletsDue
FROM
	(
		SELECT
			l.Barcode,
			COUNT(DISTINCT a.PalletId) PalletsCurrent
		FROM
			(SELECT * FROM Warehouse.WarehouseNotes WHERE NoteType = 1 OR NoteType = 6) n
			INNER JOIN (SELECT * FROM Warehouse.WarehouseActions WHERE PalletId IS NOT null) a ON (n.Id = a.WarehouseNoteId)
			INNER JOIN (SELECT * FROM Warehouse.WarehouseLocations WHERE Type = 1) l ON (a.RequestedLocationId = l.Id)
		WHERE
			(
				n.NoteType = 6
				AND
				(
					n.NoteStatus = 1
					OR
					n.NoteStatus = 2
				)
			) --CCN: Note [Waiting] OR [In Progress]
			OR
			(
				n.NoteType = 1
				AND
				(
					n.NoteStatus = 2
					AND
					a.ActionStatus = 3
				)
			) --Picking Stock: Note [In Progress] AND Pallets [Done]
		GROUP BY
			l.Barcode
	) a 
	FULL OUTER JOIN	(
		SELECT
			l.Barcode,
			COUNT(DISTINCT a.PalletId) PalletsDue
		FROM
			(SELECT * FROM Warehouse.WarehouseNotes WHERE NoteType = 1) n
			INNER JOIN (SELECT * FROM Warehouse.WarehouseActions WHERE PalletId IS NOT null AND (ActionStatus = 1 OR ActionStatus = 2)) a ON (n.Id = a.WarehouseNoteId)
			INNER JOIN (SELECT * FROM Warehouse.WarehouseLocations WHERE Type = 1) l ON (a.RequestedLocationId = l.Id)
		GROUP BY
			l.Barcode
	) b ON (a.Barcode = b.Barcode)
ORDER BY
	1
Location PalletsCurrent PalletsDue
1-BU-1-001 81 28
1-LB-1-001 5 42
2-LB-1-001 91 36
3-LB-1-001 21 20
4-LB-1-001 59 107

2nd Iteration: CPU time = 3109 ms, elapsed time = 3147 ms.

WITH
n AS (SELECT Id, NoteType, NoteStatus FROM Warehouse.WarehouseNotes WHERE NoteType = 1 OR NoteType = 6),
a AS (SELECT WarehouseNoteId, ActionStatus, RequestedLocationId, PalletId FROM Warehouse.WarehouseActions WHERE PalletId IS NOT null),
l AS (SELECT Id, Barcode FROM Warehouse.WarehouseLocations WHERE Type = 1)

SELECT 
	l.Barcode Location,
	c.PalletsCurrent,
	d.PalletsDue
FROM
	(
		SELECT
			a.RequestedLocationId,
			COUNT(DISTINCT a.PalletId) PalletsCurrent
		FROM
			n
			INNER JOIN a ON (n.Id = a.WarehouseNoteId)
		WHERE
			(
				n.NoteType = 6 --ccn note
				AND
				(
					n.NoteStatus = 1 --waiting
					OR
					n.NoteStatus = 2 --in progress
				)
			)
			OR
			(
				n.NoteType = 1 --pick note
				AND
				n.NoteStatus = 2 --note in progress
				AND
				a.ActionStatus = 3 --action done
			)
		GROUP BY
			a.RequestedLocationId
	) c
	FULL OUTER JOIN	(
		SELECT
			a.RequestedLocationId,
			COUNT(DISTINCT a.PalletId) PalletsDue
		FROM
			n
			INNER JOIN a ON (n.Id = a.WarehouseNoteId)
		WHERE
			n.NoteType = 1 --pick note
			AND
			(
				a.ActionStatus = 1 --waiting
				OR
				a.ActionStatus = 2 --in progress
			)
		GROUP BY
			a.RequestedLocationId
	) d ON (c.RequestedLocationId = d.RequestedLocationId)
	INNER JOIN l ON (c.RequestedLocationId = l.Id)
ORDER BY
	1
Location PalletsCurrent PalletsDue
1-BU-1-001 81 28
1-LB-1-001 5 42
2-LB-1-001 91 36
3-LB-1-001 21 20
4-LB-1-001 59 107

3rd Iteration: CPU time = 3937 ms, elapsed time = 4036 ms.

WITH
n AS (SELECT Id, NoteType, NoteStatus FROM Warehouse.WarehouseNotes),
a AS (SELECT WarehouseNoteId, ActionStatus, RequestedLocationId, PalletId FROM Warehouse.WarehouseActions),
l AS (SELECT Id, Barcode FROM Warehouse.WarehouseLocations WHERE Type = 1)

SELECT
	l.Barcode,
	COUNT(DISTINCT (CASE WHEN (n.NoteType = 6 AND (n.NoteStatus = 1	OR n.NoteStatus = 2)) OR (n.NoteType = 1 AND n.NoteStatus = 2 AND a.ActionStatus = 3) THEN a.PalletId END)) AS PalletsCurrent,
	COUNT(DISTINCT (CASE WHEN n.NoteType = 1 AND (a.ActionStatus = 1 OR	a.ActionStatus = 2) THEN a.PalletId END)) AS PalletsDue
FROM
	n
	INNER JOIN a ON (n.Id = a.WarehouseNoteId)
	INNER JOIN l ON (a.RequestedLocationId = l.Id)
GROUP BY
	l.Barcode
Barcode PalletsCurrent PalletsDue
1-BU-1-001 81 28
1-LB-1-001 5 42
1-P1-1-001 0 0
1-P2-1-001 0 0
2-LB-1-001 91 36
3-LB-1-001 21 20
4-LB-1-001 59 107
5-LB-1-001 0 0
6-BU-1-001 0 0

SELECT 
	CASE 
		WHEN a.Barcode IS NULL THEN b.Barcode
		ELSE a.Barcode
	END Location
	,a.PalletsCurrent
	,a.PalletsDue
FROM
	(
		SELECT
			l.Barcode
			,COUNT(DISTINCT CASE WHEN n.NoteType = 6 OR a.ActionStatus = 3 THEN a.PalletId ELSE NULL END) PalletsCurrent
            ,COUNT(DISTINCT CASE WHEN n.NoteType = 1 AND a.ActionStatus < 3 THEN a.PalletId ELSE NULL END) PalletsDue
		FROM
			Warehouse.WarehouseNotes n
			INNER JOIN Warehouse.WarehouseActions a ON (n.Id = a.WarehouseNoteId)
			INNER JOIN Warehouse.WarehouseLocations l ON (a.RequestedLocationId = l.Id)
		WHERE
			(
				l.Barcode LIKE '%BU%'
				OR
				l.Barcode LIKE '%LB%'
			)
			AND
			a.PalletId IS NOT NULL
			AND
			(
				(
					n.NoteType = 6
					AND
					(
						n.NoteStatus = 1
						OR
						n.NoteStatus = 2
					)
				) --CCN: Note [Waiting] OR [In Progress]
				OR
				(
					n.NoteType = 1
					AND
					((
						n.NoteStatus = 2
						AND
						a.ActionStatus = 3
					) OR
                    (
				        a.ActionStatus = 1
					    OR
					    a.ActionStatus = 2
                    ))
				) 
			)
		GROUP BY
			l.Barcode
	) a 
ORDER BY
	1

Also, the DDL for the table, including all indexes, would also be helpful for tuning the query.

Thanks Scott,
I like how you can just use a.ActionStatus < 3 instead of OR!
I've implemented your version of the query and I get: CPU time = 2187 ms, elapsed time = 2278 ms. Which is better on the CPU side but still about twice as long to execute than my original query :confused:

Regarding the DDL for the tables - unfortunately I only have read rights to the SQL server, I don't have the privileges to use the SQL action plan feature nor optimize the tables themselves for the query.

Elapsed time is not necessarily a good indicator of performance, since a lot of things -- such as blocking -- can effect elapsed time.

I have trouble seeing how reading from the table twice and having to do a FULL OUTER JOIN would ever be more efficient ... then again, SQL Server often surprises me with what works best when I think it will be something else.

Typically tuning involves reducing logical I/O. Try running them all after:

SET STATISTICS IO, TIME ON;

That will give you I/O stats to compare.