Totals by Week

PhilC · December 6, 2015, 10:56am

Can anyone help with this?

I have a table (TAB1) with a number of rows and two date columns ([Created] and [Closed])

What I need to do is create a select that returns a row for each week between two dates (@Start and @End), with a count of the rows from TAB1 that were "open" in that week (i.e., the [Created] date was during or before that week and the [Closed] date was during or after that week) - hope that makes sense!

I am not sure if this is possible to do in a Stored Procedure or not, so appreciate any help

Phil

sandeepmittal11 · December 7, 2015, 7:14am

you need to first create a temp table with start and dates for each week and then join it with your main table to aggregate the data week wise.
Refer below related post to generate dates for the weeks
http://www.itdeveloperzone.com/2015/10/generating-week-dates-for-month-in-sql.html

JeffModen · December 9, 2015, 5:01am

I very much appreciate anyone and everyone that takes the time to share some code but that function still needs a bit of work. Run the following and see that...

SELECT * FROM dbo.udf_GetWeekDatesForMonth(2011,2); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2011,10); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2012,4); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2012,12); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2013,9); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2014,6); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2015,8); SELECT * FROM dbo.udf_GetWeekDatesForMonth(2015,11);
... any month that ends on a Monday is missing the last week/day of the month.

Because it's an mTVF (Multi-statement Table Valued Function) and has a WHILE loop in it, it's also comparatively slow and resource intensive. It's nearly 3 times slower, uses more than 120 times more CPU, and uses more than 2,300 times the memory IO (logical reads) than the following Tally Table solution (the 6 valueTally Table is built in as a VALUES clause), which is an iTVF (Inline Table Valued Function) that relies on a simple "pseudo cursor" instead of WHILE loop..

 CREATE FUNCTION dbo.GetWeekDatesForMonth
        (@Year SMALLINT, @Month TINYINT)
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN WITH 
 cteMonth AS   (SELECT  MonthStart     = DATEADD(mm,(@Year-1900)*12+@Month-1,0)
                       ,MonthEnd       = DATEADD(mm,(@Year-1900)*12+@Month,0)-1)
,cteWeeks AS   (SELECT  MonthStart
                       ,MonthEnd
                       ,FirstWeekStart = DATEADD(dd,DATEDIFF(dd,0,MonthStart)/7*7,0)
                       ,LastWeekStart  = DATEADD(dd,DATEDIFF(dd,0,MonthEnd  )/7*7,0)
                  FROM cteMonth)

 SELECT  [Week]       = 'Week'+LEFT(ca.Wk+1,1)
        ,StartDate    =  CAST(
                            CASE 
                            WHEN ca.Wk > 0 
                            THEN DATEADD(dd,Ca.Wk*7,FirstWeekStart) 
                            ELSE MonthStart
                            END
                        AS DATE)
        ,EndDate      = CAST(
                            CASE
                                WHEN DATEADD(dd,Ca.Wk*7+6,FirstWeekStart) <= MonthEnd 
                                THEN DATEADD(dd,Ca.Wk*7+6,FirstWeekStart) 
                                ELSE MonthEnd 
                            END 
                        AS DATE)
   FROM cteWeeks
  CROSS APPLY (SELECT TOP (DATEDIFF(dd,FirstWeekStart,LastWeekStart)/7+1)
                      t.N
                 FROM (VALUES (0),(1),(2),(3),(4),(5))t(N))ca(Wk)
;

And someone smarter than me (and there are many! ) would likely beat even that.

JeffModen · December 9, 2015, 5:23am

It makes perfect sense. My question would be, what do you call a "Week"? Is it always 7 days long or is it based on months like what sandeepmittal11 tried to do with his function? What day of the week do your weeks start on? And what do you want done if someone provides a StartDate that isn't the same as the start of a week or an EndDate that isn't the same as the end of a week?

PhilC · December 9, 2015, 8:10pm

Hi Jeff; it would always be 7 days long and always start on a Monday. Basically the [Created] and [Closed] dates can be any date - i.e., any day of the week - but I need to count the number in that week; i.e., the date falls between the Monday and the Sunday

Phil

JeffModen · December 10, 2015, 3:06am

Perfect. I have to build a short little data generator to demonstrate with. I'll be back.

JeffModen · December 10, 2015, 5:40am

Ok... here we go.

First, we're going to need an incremental "row source" to help us avoid any RBAR so that we can do all of this using high performance iTVFs (Inline Table Valued Functions). The following function is based on "Tally Table" structures in a modified Itzik Ben-Gan style. If you don't have one of these babies in your databases, it's time you did. As usual, it's heavily documented in the code.

 CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
 Purpose:
 Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

 As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

 Usage:
--===== Syntax example (Returns BIGINT)
 SELECT t.N
   FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;

 Notes:
 1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
    Refer to the following URLs for how it works and introduction for how it replaces certain loops. 
    http://www.sqlservercentral.com/articles/T-SQL/62867/
    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
 2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
    will cause the sequence to start at 1.
 3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
 5. If @MaxN is negative or NULL, a "TOP" error will be returned.
 6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
    that many values, you should consider using a different tool. ;-)
 7. There will be a substantial reduction in performance if "N" is sorted in descending order.  If a descending 
    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT; 
     SELECT @MaxN = 1000;
     SELECT DescendingN = @MaxN-N+1 
       FROM dbo.fnTally(1,@MaxN);

 8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

 Revision History:
 Rev 00 - Unknown     - Jeff Moden 
        - Initial creation with error handling for @MaxN.
 Rev 01 - 09 Feb 2013 - Jeff Moden 
        - Modified to start at 0 or 1.
 Rev 02 - 16 May 2013 - Jeff Moden 
        - Removed error handling for @MaxN because of exceptional cases.
 Rev 03 - 22 Apr 2015 - Jeff Moden
        - Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
        (@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS 
 RETURN WITH
  E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1)                                  --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)      --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c)            --10E12 or 1 Trillion rows                 
            SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
             UNION ALL 
            SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;

Now we need something to generate week boundary dates with. Same thing goes here. If you don't have one already, you really do need one. Here's the code.

CREATE FUNCTION dbo.Weeks 
/**************************************************************************************************
 Purpose:
 Given a @pStart date and an End date, generate "ISO" week periods that start on Monday of the week
 that contains the @pStart date and ends on the week that contains the @pEND date.

 Returns:
    WeekNumber = Starts at 1 and increments for each returned week.
    WeekStart  = The date of the Monday for the week (Will be uses as temporally "closed").
    WeekNext   = The date of the Monday for the next week (will be used as temporally "open").

 Usage:
 SELECT WeekNumber, WeekStart, WeekNext
   FROM dbo.Weeks(@pStartDate, @pEndDate)
;

 Revision History:
 Rev 00 - 09 Dec 2015 - Jeff Moden
        - Redacted/simplified production model for posting.
**************************************************************************************************/
--===== These are the parameters of this iTVF
        (
         @pStartDate    DATETIME
        ,@pEndDate      DATETIME
        )
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN 
--===== Expand the dates to weeks
   WITH
cteParameters AS
(
 SELECT  StartWeek  = DATEADD(dd,DATEDIFF(dd,0,@pStartDate)/7*7,0)
        ,EndWeek    = DATEADD(dd,DATEDIFF(dd,0,@pEndDate  )/7*7,0)
)
 SELECT  WeekNumber = t.N+1
        ,WeekStart  = DATEADD(dd,t.N*7  ,StartWeek)
        ,WeekNext   = DATEADD(dd,t.N*7+7,StartWeek)
   FROM cteParameters p
  CROSS APPLY dbo.fnTally(0,DATEDIFF(wk,p.StartWeek,p.EndWeek)) t
;

Up next, we need some test data. I normally test with a million rows and this will be no exception. Don't worry... it only takes just several seconds thank to the "pseudo cursor" formed by the constrained CROSS JOIN.

--=================================================================================================
--      Create and populate a test table.
--      It consists of random start dates for 2014/2015 and an end date for up to 366 days later.
--      Nothing in this section is a part of the solution. We're just building test data here.
--=================================================================================================
--===== If the test table exists, drop it to make reruns easier in SSMS
     IF OBJECT_ID('tempdb..#Tab1','U') IS NOT NULL
        DROP TABLE #Tab1
;
--===== Create and populate the test table on the fly.

WITH
cteGenCreatedDates AS
(
 SELECT TOP 1000000
        Created = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2014','2016')+CAST('2014' AS DATETIME)
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
)
 SELECT  RowNum  = ISNULL(ROW_NUMBER() OVER (ORDER BY Created),0) --ISNULL creates NOT NULL column
        ,Created = ISNULL(Created,0)
        ,Closed  = ISNULL(RAND(CHECKSUM(NEWID()))*366+Created,0)
   INTO #Tab1
   FROM cteGenCreatedDates
;
--===== Create the most useful index
 CREATE CLUSTERED INDEX IXC_Created ON #Tab1 (Created)
;

Ok... we have our two permanent functions in place and the test data has been built. The functions make this exercise almost child's play. Here's the code to solve your original problem...

 SELECT  wks.WeekStart
        ,WeekEnd    = DATEADD(dd,-1,wks.WeekNext)
        ,OpenCount  = COUNT(*)
   FROM dbo.Weeks('2014-12-09', '2015-01-15') wks
   LEFT JOIN #Tab1 dat
     ON dat.Created < wks.WeekNext
    AND dat.Closed >= wks.WeekStart
  GROUP BY wks.WeekStart,wks.WeekNext
;

... and here are the results...

WeekStart               WeekEnd                 OpenCount
----------------------- ----------------------- -----------
2014-12-29 00:00:00.000 2015-01-04 00:00:00.000 260847
2014-12-08 00:00:00.000 2014-12-14 00:00:00.000 259606
2014-12-15 00:00:00.000 2014-12-21 00:00:00.000 260282
2015-01-05 00:00:00.000 2015-01-11 00:00:00.000 260638
2015-01-12 00:00:00.000 2015-01-18 00:00:00.000 260513
2014-12-22 00:00:00.000 2014-12-28 00:00:00.000 260696

(6 row(s) affected)

If you have "Closed" dates that are NULL, you can add in an ISNULL(WeekNext,'9999') to replace "WeekNext" in the WeekEnd formula.

For more information on how the relationships in the ON clause work to solve this problem, please see the following article.

http://www.sqlservercentral.com/articles/T-SQL/105968/

I don't have 2012 on any of my boxes yet (stuck in the 2005/2008 worlds for now) to do a nice running total with but, from there, you can start to do some analytics...

WITH ctePreAggregate AS
(
 SELECT  wks.WeekStart
        ,WeekEnd    = DATEADD(dd,-1,wks.WeekNext)
        ,OpenCount  = COUNT(*)
   FROM dbo.Weeks('2014-12-09', '2015-01-15') wks --Parameterize these dates
   LEFT JOIN #Tab1 dat
     ON dat.Created < wks.WeekNext
    AND dat.Closed >= wks.WeekStart
  GROUP BY wks.WeekStart,wks.WeekNext
)
 SELECT WeekStart       = CAST(WeekStart AS DATE)
        ,WeekEnd        = CAST(WeekEnd AS DATE)
        ,OpenCount
        ,PercentOfTotal = CAST(OpenCount*100.0/SUM(OpenCount) OVER (PARTITION BY (SELECT NULL)) AS DECIMAL(4,1))
        ,Total          = SUM(OpenCount) OVER (PARTITION BY (SELECT NULL))
   FROM ctePreAggregate
  ORDER BY WeekStart
;

... to do things like getting what percentage of the total temporal span each week represents for counts.

WeekStart  WeekEnd    OpenCount   PercentOfTotal Total
---------- ---------- ----------- -------------- -----------
2014-12-08 2014-12-14 259606      16.6           1562582
2014-12-15 2014-12-21 260282      16.7           1562582
2014-12-22 2014-12-28 260696      16.7           1562582
2014-12-29 2015-01-04 260847      16.7           1562582
2015-01-05 2015-01-11 260638      16.7           1562582
2015-01-12 2015-01-18 260513      16.7           1562582

(6 row(s) affected)

JeffModen · December 10, 2015, 5:47am

Ah... almost forgot. The reason why we use the beginning of the next week as the exclusive end date for each week is so we don't have to mess around with the time element of the rows in the table to include rows from the last day of any given week that may also have a non-midnight time..

PhilC · December 10, 2015, 7:32pm

Thanks Jeff this is excellent - really appreciate this and it is has taught me a lot.

again, many thanks

Phil

JeffModen · December 11, 2015, 3:19pm

You're welcome. Thank you for the feedback.

jcelko · December 12, 2015, 4:26pm

Let us look at the DDL that you did not post for your tab one table. We have no idea what the key is, but we know you have creation_date, which is of type date and closure_date which is of type date. We also know that CHECK (creation_date <= closure date). Yes, the check constraint is important; this information will be passed on the Optimizer to make life a lot better.

Next, you really need to look at the calendar table. You should always have one of these. It will be a calendar date, and a week date. The week date is also defined in ISO 8601, and has the format "yyyyWww-d" , where it starts with a four digit year, has the W as a marker, followed by the week within year number (01 through 53) a – and the date within the week (1-7). You can download week dates from various sites on the Internet. This particular format is popular in the Scandinavian countries.

Your query is now a simple BETWEEN predicate. You will also find this calendar table Andy for a lot of other things. Remember, SQL is a database language, not a computational language.

JeffModen · December 12, 2015, 11:43pm

Thanks for the ISO Week format information, Joe.

Shifting gears to your statement above and suppressing my normal rant about the total fallacy of ever using BETWEEN for temporal range criteria, how would you, in fact, use BETWEEN to solve the particular problem posted on this thread? To summarize, the OP wants to know currently active "things" that may or may not have a start/end date that lives within a given week period. I'm interested in seeing that done using BETWEEN, as you propose. You can use the sample data that I created in my "solution" post to the OP as DDL and test data. And yes, I know I don't have a constraint to guarantee that the start date will always be less that the end date. If it bugs you, please feel free to add such a constraint because the generated test data follows that rule.

Looking forward to your demonstrable T-SQL coded solution that uses BETWEEN to solve this problem. If you need me to generate the calendar table that your solution requires, let me know and I can do that, as well.

sandeepmittal11 · December 18, 2015, 7:08am

Dear Jeff,
Thanks for pointing out the issue. The same is corrected in the post.
However for one month you will get max 5 rows, so i think it should not make a big difference.

sandeepmittal11 · December 18, 2015, 7:33am

Dear Jeff,
I liked your solution and modified accordingly. Thanks

JeffModen · December 18, 2015, 3:04pm

Heh... you ended up using a recursive CTE that "counts". I'm glad you gave me no credit for the changes you made because rCTEs are as slow as most While loops, can be slower that a properly written While loop, and usually more resource intensive. The truth is that you should avoid both. Please see the following article the "Hidden RBAR" nature of recursive CTEs that count.

http://www.sqlservercentral.com/articles/T-SQL/74118/

jcelko · December 18, 2015, 3:51pm

Back from my color3ectal neoiplasia exams. This makes you miss that Xmas sweater .
Here are some details.

CREATE TABLE Events
(event_name CHAR(5) NOT NULL PRIMARY KEY,
creation_date DATE NOT NULL,
closure_date DATE, – null means open
CHECK (creation_date <= closure_date)
);

What I need to do is create a select that returns a row for each week between two dates (@in_start_date and @in_end_date), with a count of the rows from TAB1 that were "open" in that week (i.e., the [Created] date was during or before that week and the [Closed] date was during or after that week) <<

Google what a Calendar table is and how many ways you can define one. In this case you will need to use Week Numbers for 2024 or a site with the full week date.Also everyone ought to have the Rick Snodgrass book from University of Arizona on temporal SQL.

The week date is popular in the Nordic countries and the format is “yyyyWww-d” where yyyy is the year, W is a token, ww is 01-53 for the week in the year, dash is a token and d is 1-7 for the day in the week.

CREATE TABLE Calendar
(cal_date DATE NOT NULL PRIMARY KEY,
week_date CHAR(9) NOT NULL
CHECK (week_date LIKE '[12][0-9][0-9][0-9]W[0-5][0-9]-[0-7]')
);

SELECT DISTINCT SUBSTRING(week_date, 1,6) AS week_nbr
FROM Calendar
WHERE cal_date BETWEEN @in_start_date AND @in_end_date);

This a basic query. You can now use the "day" number with a GROUP BY week_nbr to measure the partial weeks
CASE
WHEN (MIN(day) = 1 AND MAX(day) = 7 THEN 'full week'
WHEN (MIN(day) > 1 THEN 'end week'
WHEN (MAX(day) < 7 THEN 'start week'
ELSE NULL END

Remember that SQL is a DATA language and not a computational one.

JeffModen · December 23, 2015, 2:53pm

Understood but it's not the big differences that make big differences. This week I fixed a piece of code that most people in the shop said was "optimized". The code "only" took 1 second to run against two multimillion row tables and "only" took 227,000 reads. What they didn't get was that it runs 26,500 times in an 8 hour period. That's 7.4 CPU hours and 49 TRILLION bytes of memory I/O. Obviously, my improvements couldn't save more than 1 second per run because that's how long it was taking so doesn't seem like much of a savings. I got the code down to 500 microseconds and about 100 reads per run. For the same 8 hour period, that means I dropped the CPU time from 7.3 hours to just a little over 13 seconds and I dropped the reads down to about 13 billion bytes instead of 49 TRILLION (that's 49 with 12 zeroes after it). That's 4 orders of magnitude improvement for CPU and 3 orders of magnitude for memory I/O,

You can't get those kinds of improvements even from SSDs.

As Granny used to say, "Mind the pennies and the dollars will take care of themselves".

JeffModen · December 23, 2015, 3:02pm

That's actually a pretty useless link when it comes to "Calendar" tables. You'd actually have to screen scrape that one year at a time to come up with a decent one. And, no... the download links don't actually work well on that site. They're more interested in displayed circular ads that actually belong in the proverbial round file.

Put your money where your mouth is, Joe. Show us YOUR code for generating a "Calendar" table. Then, demonstrate its usage with "BETWEEN" as you said. Just as 1 picture is worth a thousand words, one code example is worth a thousand "expert" opinions.

jcelko · December 24, 2015, 4:18pm

EarthLink
Submit

jcelko212@earthlink.net | Preferences | Help | Feedback | Sign Out

3% of 2100 MB used.
Email Folders [Edit]
Inbox [195]
Drafts
Sent [98]
Trash [1] [Empty]
memberships [6]
things to buy [12]
Undeliverabl...
spamBlocker [Edit]
Suspect Email [5]
Known Spam [165]
Virus Blocker

Message
‹ Previous | Next › | « Back to Sent
Reply Reply All Print Delete
From: --CELKO-- jcelko212@earthlink.net
To: Jeff Moden noreply@forums.sqlteam.com
Subject: Re: [SQLTeam.com Forums] [Transact-SQL] Totals by Week
Date: Dec 24, 2015 9:41 AM
If you want an easier source of text for him building week calendars, then try this site:

http://www.calendar-week.org/calendarweeks/2016/

Go to a year, cut and paste it in notepad, move it over to a word processor. It is a tab separated table (in the word process sense of table). Capture a macro to put it into a simple format with a little editing. The first column is the week-within-year, so '01' becomes '2016W01', and the oher columns go from "dd.mm."to 'dd-mm-yyyy' so it becomes a single long INSERT INTO statement. Do this once and capture this as a macro in your word processor (or good text editor), with the year as a parameter

The only bad news is that you have to work around the local dialect on this site with a SET DATEFORMAT. But I have not used temporal computation to do this.
Now the T-SQL is pretty straight forward:

CREATE TABLE wk_2016
(week_nbr CHAR(7) NOT NULL,
mo DATE NOT NULL,
tu DATE NOT NULL,
we DATE NOT NULL,
th DATE NOT NULL,
fr DATE NOT NULL,
sa DATE NOT NULL,
su DATE NOT NULL
);

SET DATEFORMAT DYM; -- have to work around the local dialect on this site

INSERT INTO wk_2016
VALUES
('2016W01', '04-01-2016', '05-01-2016', '06-01-2016', '07-01-2016', '08-01-2016', '09-01-2016', '10-01-2016'),
('2016W02', '11-01-2016', '12-01-2016', '13-01-2016', '14-01-2016', '15-01-2016', '16-01-2016', '17-01-2016'),
('2016W03', '18-01-2016', '19-01-2016', '20-01-2016', '21-01-2016', '22-01-2016', '23-01-2016', '24-01-2016'),
ETC.;

Since I want to use this data in a Calendar table, I need to "flatten" it out with a VIEW or an insertion statement:

CREATE VIEW X (week_date, cal_date)
AS
SELECT week_nbr +'-1', mo FROM wk_2016
UNION ALL
SELECT week_nbr +'-2', tu FROM wk_2016
UNION ALL
SELECT week_nbr +'-3', we FROM wk_2016
UNION ALL
SELECT week_nbr +'-4', th FROM wk_2016
UNION ALL
SELECT week_nbr +'-5', fr FROM wk_2016
UNION ALL
SELECT week_nbr +'-6', sa FROM wk_2016
UNION ALL
SELECT week_nbr +'-7', su FROM wk_2016;

JeffModen · December 24, 2015, 5:01pm

That's what I'm talking about. You shouldn't have to copy and paste anything except code that's capable of generating the whole shebang. With all your talk about the use of Calendar tables over the years, I thought you'd have such a thing by now and was interested in seeing what your interpretation was.