Ok... here we go.
First, I had to change a couple of things in the original code that you provided to build the DateDimension table. For example , your ReportDate column was set to NVARCHAR(6), which isn't big enough to handle some countries month abbreviations. I changed it to NARCHAR(30).
Also, the index your created was in the right vein but it only supported a SEEK followed by a range SCAN of all rows in a given language. So, in reality, it was always SCANing almost 11,000 rows every query. Now, that's not a huge problem but it is a wasteful thing. You also had nothing to guarantee that each language only had 1 and only 1 copy of any given date in it. Further, there was nothing to handle a contingency if someone went search for another column that didn't have an NCI on it.
So, I added a Clustered PK that would solve all 3 of those problems.
I didn't, however, take the time to fix the rather random use of CASING in all of the column names. I left them all as they were. In real life, I'd have standardized the casing.
Here's the copy of your code with the slight modifications I added above.
WARNING!!! WARNING!!! WARNING!!! WARNING!!! WARNING!!! WARNING!!! WARNING!!!
THIS CODE UNCONDITIONALLY DROPS A TABLE NAMED DBO.DATEDIMENSION!!!
WARNING!!! WARNING!!! WARNING!!! WARNING!!! WARNING!!! WARNING!!! WARNING!!!
DECLARE @StartDate DATE = '20000101', @NumberOfYears INT = 30;
-- prevent set or regional settings from interfering with
-- interpretation of dates / literals
SET DATEFIRST 7;
SET DATEFORMAT mdy;
SET LANGUAGE us_english;
DECLARE @CutoffDate DATE = DATEADD(YEAR, @NumberOfYears, @StartDate);
--===== If the new table already exists, drop it(Changed this)
IF OBJECT_ID('tempdb..#Dim','U') IS NOT NULL
DROP TABLE #Dim
;
--===== If the new table already exists, drop it(Changed this)
IF OBJECT_ID('dbo.DateDimension','U') IS NOT NULL
DROP TABLE dbo.DateDimension
;
CREATE TABLE dbo.DateDimension
(
--DateKey INT NOT NULL PRIMARY KEY,
[Date] DATE NOT NULL,
[Day] TINYINT NOT NULL,
--DaySuffix CHAR(2) NOT NULL,
[Weekday] TINYINT NOT NULL,
WeekDayName VARCHAR(10) NOT NULL,
IsWeekend BIT NOT NULL,
IsHoliday BIT NOT NULL,
HolidayText VARCHAR(64) SPARSE,
DOWInMonth TINYINT NOT NULL,
[DayOfYear] SMALLINT NOT NULL,
WeekOfMonth TINYINT NOT NULL,
WeekOfYear TINYINT NOT NULL,
ISOWeekOfYear TINYINT NOT NULL,
[Month] TINYINT NOT NULL,
[MonthName] VARCHAR(10) NOT NULL,
[Quarter] TINYINT NOT NULL,
--QuarterName VARCHAR(6) NOT NULL,
[Year] INT NOT NULL,
MMYYYY CHAR(6) NOT NULL,
MonthYear CHAR(7) NOT NULL,
FirstDayOfMonth DATE NOT NULL,
LastDayOfMonth DATE NOT NULL,
FirstDayOfQuarter DATE NOT NULL,
LastDayOfQuarter DATE NOT NULL,
FirstDayOfYear DATE NOT NULL,
LastDayOfYear DATE NOT NULL,
FirstDayOfNextMonth DATE NOT NULL,
FirstDayOfNextYear DATE NOT NULL,
LanguageCode NVARCHAR(30) NOT NULL,
ReportDate nvarchar(30) not null --changed this from 6 to 30
)
;
--===== Removed this index because, although it did a SEEK, it was
-- also doing a SCAN of the more than 10K Rows for each language.
--CREATE NONCLUSTERED INDEX [NCI_DateDimension] ON [dbo].DateDimension
--(
-- LanguageCode ASC
--) include ([date],ReportDate)
--;
--===== Replaced the above index with a real Clustered PK that will handle
-- the odd lookups without having to build a special Non-Clustered Index
-- to handle them. This also guarantees only 1 date per language.
ALTER TABLE [dbo].DateDimension
ADD CONSTRAINT PK_DateDimension PRIMARY KEY CLUSTERED (LanguageCode,[date])
;
--this is just a holding table for intermediate calculations:
CREATE TABLE #dim
(
[date] DATE PRIMARY KEY CLUSTERED,
[day] AS DATEPART(DAY, [date]),
[month] AS DATEPART(MONTH, [date]),
FirstOfMonth AS CONVERT(DATE, DATEADD(MONTH, DATEDIFF(MONTH, 0, [date]), 0)),
[MonthName] AS DATENAME(MONTH, [date]),
[week] AS DATEPART(WEEK, [date]),
[ISOweek] AS DATEPART(ISO_WEEK, [date]),
[DayOfWeek] AS DATEPART(WEEKDAY, [date]),
[quarter] AS DATEPART(QUARTER, [date]),
[year] AS DATEPART(YEAR, [date]),
FirstOfYear AS CONVERT(DATE, DATEADD(YEAR, DATEDIFF(YEAR, 0, [date]), 0)),
Style112 AS CONVERT(CHAR(8), [date], 112),
Style101 AS CONVERT(CHAR(10), [date], 101)
)
;
-- use the catalog views to generate as many rows as we need
INSERT #dim([date])
SELECT d
FROM
(
SELECT d = DATEADD(DAY, rn - 1, @StartDate)
FROM
(
SELECT TOP (DATEDIFF(DAY, @StartDate, @CutoffDate))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
-- on my system this would support > 5 million days
ORDER BY s1.[object_id]
) AS x
) AS y;
INSERT dbo.DateDimension WITH (TABLOCKX)
SELECT
--DateKey = CONVERT(INT, Style112),
[Date] = [date],
[Day] = CONVERT(TINYINT, [day]),
--DaySuffix = CONVERT(CHAR(2), CASE WHEN [day] / 10 = 1 THEN 'th' ELSE
-- CASE RIGHT([day], 1) WHEN '1' THEN 'st' WHEN '2' THEN 'nd'
-- WHEN '3' THEN 'rd' ELSE 'th' END END),
[Weekday] = CONVERT(TINYINT, [DayOfWeek]),
[WeekDayName] = CONVERT(VARCHAR(10), DATENAME(WEEKDAY, [date])),
[IsWeekend] = CONVERT(BIT, CASE WHEN [DayOfWeek] IN (1,7) THEN 1 ELSE 0 END),
[IsHoliday] = CONVERT(BIT, 0),
HolidayText = CONVERT(VARCHAR(64), NULL),
[DOWInMonth] = CONVERT(TINYINT, ROW_NUMBER() OVER
(PARTITION BY FirstOfMonth, [DayOfWeek] ORDER BY [date])),
[DayOfYear] = CONVERT(SMALLINT, DATEPART(DAYOFYEAR, [date])),
WeekOfMonth = CONVERT(TINYINT, DENSE_RANK() OVER
(PARTITION BY [year], [month] ORDER BY [week])),
WeekOfYear = CONVERT(TINYINT, [week]),
ISOWeekOfYear = CONVERT(TINYINT, ISOWeek),
[Month] = CONVERT(TINYINT, [month]),
[MonthName] = CONVERT(VARCHAR(10), [MonthName]),
[Quarter] = CONVERT(TINYINT, [quarter]),
--QuarterName = CONVERT(VARCHAR(6), CASE [quarter] WHEN 1 THEN 'First'
-- WHEN 2 THEN 'Second' WHEN 3 THEN 'Third' WHEN 4 THEN 'Fourth' END),
[Year] = [year],
MMYYYY = CONVERT(CHAR(6), LEFT(Style101, 2) + LEFT(Style112, 4)),
MonthYear = CONVERT(CHAR(7), LEFT([MonthName], 3) + LEFT(Style112, 4)),
FirstDayOfMonth = FirstOfMonth,
LastDayOfMonth = MAX([date]) OVER (PARTITION BY [year], [month]),
FirstDayOfQuarter = MIN([date]) OVER (PARTITION BY [year], [quarter]),
LastDayOfQuarter = MAX([date]) OVER (PARTITION BY [year], [quarter]),
FirstDayOfYear = FirstOfYear,
LastDayOfYear = MAX([date]) OVER (PARTITION BY [year]),
FirstDayOfNextMonth = DATEADD(MONTH, 1, FirstOfMonth),
FirstDayOfNextYear = DATEADD(YEAR, 1, FirstOfYear),
'en-US' as LanguageCode,
ReportDate = FORMAT([date], 'MMM-yy')
FROM #dim
OPTION (MAXDOP 1);
--Jul-05
SET LANGUAGE Italian;
INSERT dbo.DateDimension WITH (TABLOCKX)
SELECT
--DateKey = CONVERT(INT, Style112),
[Date] = [date],
[Day] = CONVERT(TINYINT, [day]),
--DaySuffix = CONVERT(CHAR(2), CASE WHEN [day] / 10 = 1 THEN 'th' ELSE
-- CASE RIGHT([day], 1) WHEN '1' THEN 'st' WHEN '2' THEN 'nd'
-- WHEN '3' THEN 'rd' ELSE 'th' END END),
[Weekday] = CONVERT(TINYINT, [DayOfWeek]),
[WeekDayName] = CONVERT(VARCHAR(10), DATENAME(WEEKDAY, [date])),
[IsWeekend] = CONVERT(BIT, CASE WHEN [DayOfWeek] IN (1,7) THEN 1 ELSE 0 END),
[IsHoliday] = CONVERT(BIT, 0),
HolidayText = CONVERT(VARCHAR(64), NULL),
[DOWInMonth] = CONVERT(TINYINT, ROW_NUMBER() OVER
(PARTITION BY FirstOfMonth, [DayOfWeek] ORDER BY [date])),
[DayOfYear] = CONVERT(SMALLINT, DATEPART(DAYOFYEAR, [date])),
WeekOfMonth = CONVERT(TINYINT, DENSE_RANK() OVER
(PARTITION BY [year], [month] ORDER BY [week])),
WeekOfYear = CONVERT(TINYINT, [week]),
ISOWeekOfYear = CONVERT(TINYINT, ISOWeek),
[Month] = CONVERT(TINYINT, [month]),
[MonthName] = CONVERT(VARCHAR(10), [MonthName]),
[Quarter] = CONVERT(TINYINT, [quarter]),
--QuarterName = CONVERT(VARCHAR(6), CASE [quarter] WHEN 1 THEN 'First'
-- WHEN 2 THEN 'Second' WHEN 3 THEN 'Third' WHEN 4 THEN 'Fourth' END),
[Year] = [year],
MMYYYY = CONVERT(CHAR(6), LEFT(Style101, 2) + LEFT(Style112, 4)),
MonthYear = CONVERT(CHAR(7), LEFT([MonthName], 3) + LEFT(Style112, 4)),
FirstDayOfMonth = FirstOfMonth,
LastDayOfMonth = MAX([date]) OVER (PARTITION BY [year], [month]),
FirstDayOfQuarter = MIN([date]) OVER (PARTITION BY [year], [quarter]),
LastDayOfQuarter = MAX([date]) OVER (PARTITION BY [year], [quarter]),
FirstDayOfYear = FirstOfYear,
LastDayOfYear = MAX([date]) OVER (PARTITION BY [year]),
FirstDayOfNextMonth = DATEADD(MONTH, 1, FirstOfMonth),
FirstDayOfNextYear = DATEADD(YEAR, 1, FirstOfYear),
'it-IT' as LanguageCode,
ReportDate = FORMAT([date], 'MMM-yy')
FROM #dim
OPTION (MAXDOP 1);
--===== Houskeeping
-- These settings are session sensitive so you don't
-- normally need to do this.
SET LANGUAGE us_english; --Added this
DROP TABLE #Dim --Added this
;
You tested with a little over 10,000 rows in the dbo.DateDimension table and that also didn't require a JOIN anywhere. With that in mind, the following builds a 1 column table containing 10 Million random dates (a small table by today's standards). Since it only has a single non-unique column, I only added a Non Clustered Index to it.
Here's the code...
--===== Presets
SET LANGUAGE us_english;
IF OBJECT_ID('tempdb..#Dates','U') IS NOT NULL
DROP TABLE #Dates
;
--===== Create a large but narrow test table containing random DATETIME/s over the 10 year period of
-- Jan 2015 up to and not including Jan 2025.
SELECT TOP 10000000 --Yeah... 10 Million. That's small in today's terms.
SomeDate = ISNULL(RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'Jan 2015','Jan 2025')+CONVERT(DATETIME,'Jan 2015'),0)
INTO #Dates
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
--===== Index the dae column like you would on a real table.
-- There are no columns to INCLUDE on this particular table.
-- we're keeping it real simple.
CREATE INDEX IX_#Dates_Date ON #Dates (SomeDate)
;
Now for some testing. I did "english-US" testing first, just to set a baseline. I've got several tests and they include the code we've corrected together for the strictly inline calculations. I also added one extra test to show just how damned expensive it is to add a dash if you don't know about COLLATION in conjunction with the use of REPLACE.
Here's the code for the "English_US" version. Each section of the code clears proc and buffer cache and then runs each section 5 times. This allows to see if there are any anomalies and determine if there's a significant penalty on the first uncached run of each section.
What we're doing is looking up one full year (2019) in the #Dates table and then coming up with the "Reporting Dates". One full year contains approximately 1/10th of the ten year range, so we're working with approximately 1 Million rows out of the 10 (kind of like real life) In the case of my runs below, the count was actually 999,511 rows.
--@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
GO
SET LANGUAGE us_english; --====================================================
--SET LANGUAGE Croatian; --======================================================
GO
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== FORMAT ================================================================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = FORMAT(tt.SomeDate, 'MMM-yy', 'en-US')
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== CALENDAR TABLE ========================================================
SET STATISTICS TIME,IO off
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = ReportDate
FROM dbo.DateDimension dd
JOIN #Dates tt ON dd.[Date] = CONVERT(DATE,tt.SomeDate) --Required to match
WHERE dd.[Date] >= '20190101' --This is required to avoid a large 10K row scan
AND dd.[Date] < '20200101' --This is required to avoid a large 10K row scan
AND dd.LanguageCode = 'en-US'
AND tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== INLINE FORMULA WITHOUT BINARY COLLATION ===============================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = REPLACE(SUBSTRING(CONVERT(NVARCHAR(30),tt.SomeDate,6),4,30),N' ',N'-')
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== INLINE FORMULA WITH BINARY COLLATION ==================================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = REPLACE(SUBSTRING(CONVERT(NVARCHAR(30),tt.SomeDate,6) COLLATE Latin1_General_BIN ,4,30),N' ',N'-')
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== INLINE FORMULA WITHOUT THE DASH =======================================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = SUBSTRING(CONVERT(NVARCHAR(30),tt.SomeDate,6),4,30)
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
GO
Since the Execution Plan (even the supposedly "Actual" Execution Plan") has a shedload of estimates in it, I never use it as the final determination as to what the better code is. So, here's the Profiler Run I had running against the SPID I was using. There's a lot of good information to be derived from this but, most painfully obvious, is that FORMAT is bad for performance. I won't use it even on small rowsets because, if I did, it would become a piece of the "Death by a thousand cuts" that I've been trying to remove from my production databases for the last 8 years (and I'm still not done and long story about why it appears that I'm chasing milliseconds).
Here's the output from the SQL Profiler run. And be sure to notice what the use of COLLATE did for us as well as simply not adding a bloody dash to the MMM-YY format.
There are 3 obvious winners there and, of the 3, getting rid of the dash was the best of all.
Here's the same code with the minor culture and language changes for Italian...
--@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
GO
SET LANGUAGE Italian; --=======================================================
--SET LANGUAGE us_english; --====================================================
--SET LANGUAGE Croatian; --======================================================
GO
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== FORMAT ================================================================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = FORMAT(tt.SomeDate, 'MMM-yy', 'it_IT')
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== CALENDAR TABLE ========================================================
SET STATISTICS TIME,IO off
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = ReportDate
FROM dbo.DateDimension dd
JOIN #Dates tt ON dd.[Date] = CONVERT(DATE,tt.SomeDate) --Required to match
WHERE dd.[Date] >= '20190101' --This is required to avoid a large 10K row scan
AND dd.[Date] < '20200101' --This is required to avoid a large 10K row scan
AND dd.LanguageCode = 'it-IT'
AND tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== INLINE FORMULA WITHOUT BINARY COLLATION ===============================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = REPLACE(SUBSTRING(CONVERT(NVARCHAR(30),tt.SomeDate,6),4,30),N' ',N'-')
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== INLINE FORMULA WITH BINARY COLLATION ==================================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = REPLACE(SUBSTRING(CONVERT(NVARCHAR(30),tt.SomeDate,6) COLLATE Latin1_General_BIN ,4,30),N' ',N'-')
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--=============================================================================
-- Clear proc and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
--===== INLINE FORMULA WITHOUT THE DASH =======================================
DECLARE @BitBucket NVARCHAR(30);
SELECT @BitBucket = SUBSTRING(CONVERT(NVARCHAR(30),tt.SomeDate,6),4,30)
FROM #Dates tt
WHERE tt.SomeDate >= '20190101'
AND tt.SomeDate < '20200101'
;
GO 5
--@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
GO
Here's the profiler run from that... no surprises...
I also did the Finnish language to make sure that their longer abbreviations were coming up correct in the formulas. They did. I didn't take the time to create rows in the dbo.DateDimension table for Finnish and so couldn't run that test for the Finnish Language. I'll also tell you that the FORMAT function took an extra 400ms for the Finnish Language. The formula methods actually didn't take any extra time.
And all of that brings up a couple of good points.
- If your Calendar Table (DateDimension table, whatever) doesn't support a given language, you're dead in the water until it has been made to do so. The formulas don't have that problem.
- FORMAT is handy if you're in one language and need to format dates in another without changing languages (as is the Calendar Table) but ("It Depends") it may not be worth all the extra duration and CPU. Like I said, 10 million rows is nothing today and we only looked up about a million. It would take 85-90 seconds to process 10 million rows with FORMAT where the formulas and Calendar Table would each do it in less than 3 seconds.
- I won't even use FORMAT for little stuff until they fix its performance.
- You have to remember to add date range limits to the calendar side of the criteria or you're going to be scanning almost 11K rows for the given language instead of just 365 for a given year. It won't cost you much unless you're doing thousands of lookups per hour... and you'd better plan on that happening.
- Formulas ROCK!