CTE 'With' help

Bigun · May 7, 2018, 2:01pm

So I have a CTE generating a list of dates and a list of numbers with dates.

DECLARE	@JurisdictionName AS CHAR(30) = 'Some Agency'
DECLARE @StartDate AS DATE = '3/1/2018'
DECLARE @EndDate AS DATE = '3/31/2018'

;WITH all_dates AS
   (SELECT DATEADD(dd, Numbers.n - 1, @StartDate) AS Dates
      FROM Numbers
     WHERE Numbers.n <= DATEDIFF(dd, @StartDate, @EndDate) + 1)
,response_dates AS
   (SELECT
		DISTINCT(CAST(Response_Master_Incident.Response_Date AS DATE)) AS Response_Date
		,COUNT(CAST(Response_Master_Incident.Response_Date AS DATE)) AS Response_Count
	FROM 
		Response_Master_Incident
	WHERE Response_Master_Incident.Time_First_Unit_Assigned IS NOT NULL
		AND CAST(Response_Master_Incident.Response_Date AS DATE) >= @StartDate
		AND CAST(Response_Master_Incident.Response_Date AS DATE) <= @EndDate
		AND EXISTS (
			SELECT TOP 1 * FROM Activity_Log
				JOIN Unit_Names
					ON Unit_Names.Alias = Activity_Log.Radio_Name
				JOIN Vehicle
					ON Vehicle.UnitName_ID = Unit_Names.ID
				JOIN Division
					ON Vehicle.HomeDivisionID = Division.ID
				JOIN Jurisdiction
					ON Division.JurisdictionID = Jurisdiction.ID
			WHERE Jurisdiction.Name = @JurisdictionName
				AND Activity_Log.Master_Incident_ID = Response_Master_Incident.ID
		)
	GROUP BY CAST(Response_Master_Incident.Response_Date AS DATE)
	)

Then I take the results, and I join and display them.

	 SELECT 
	a.Dates
	,ISNULL(Response_Count, 0) AS Response_Count
	FROM all_dates a 
		LEFT JOIN response_dates r 
			ON a.Dates = r.Response_Date

This does work, but it seems to take several minutes. Not sure why it's so slow, bit if there's a quicker way to handle this, I'd love to try it.

ScottPletcher · May 7, 2018, 2:13pm

Don't use any function on, or CAST, a table column unless you absolutely have to. Change the Date comparisons to this:

		AND Response_Master_Incident.Response_Date >= @StartDate
		AND Response_Master_Incident.Response_Date <  DATEADD(DAY, 1, @EndDate)

Also, CROSS JOIN is much more efficient than a recursive query to gen numbers/dates, but for only 31 values that should only be a few extra secs.

Bigun · May 7, 2018, 6:14pm

Cross join doesn't seem to match the dates and numbers together and just winds up a mess.

I got away from the cast and the execution time didn't improve.

My goal was to get a count for each date in a date range, even if the result would return a zero.

jeffw8713 · May 7, 2018, 6:41pm

In response_dates - you have DISTINCT and GROUP BY - remove the DISTINCT as it isn't necessary.

Review the data type of Response_Master_Incident.Response_Date - is it a datetime or a date? If it is a datetime - set your variables to datetime, if it is a date - remove the CAST.

Remove the CAST from the count and use COUNT(*) or COUNT(rmi.Response_Date) as this will count the number of rows within each group.

Use aliases for your tables and schema qualify them - this will simplify the query and make it easier to read. For example:

FROM dbo.Response_Master_Incident AS rmi
FROM dbo.Activity_Log AS al
JOIN dbo.Unit_Names AS un ON un.Alias = al.Radio_Name

You can also remove the TOP 1 from the EXISTS - it isn't needed and may be causing an additional sort operation.

You could try changing this a bit...

Instead of using an EXISTS - include the joined tables so you can return the JurisdictionName. Group on the JurisdictionName - the Response_Date - and then COUNT the number of rows.

In the outer query:

SELECT ad.Dates, ISNULL(rd.Response_Count, 0) AS Response_Count
FROM all_dates AS ad
LEFT JOIN response_dates AS rd ON rd.Response_Date = ad.Dates AND rd.JurisdictionName = @JurisdictionName

Compare the execution plans to see which is better - and identify any needed indexes to improve performance.

Bigun · May 7, 2018, 8:16pm

Fixed it myself. The query went from 3 minutes and 15 seconds down to below 1 second.

The activity_log table is gigantic, I added one extra WHERE clause in the EXISTS portion of the query to whittle down on the records, and everything is a lot faster now.

ScottPletcher · May 7, 2018, 8:46pm

Cluster the table on Response_Date and you'll get better response from all queries if that's a condition you (almost) always specify when reading the table. Don't believe the myth that an identity column is "always" the best clustering key, or even that it should be the "default clustering key". There should never be a "default" clustering key, only one carefully chosen based on the usage of that specific table.