Time allocation between blocks of time

max266 · November 8, 2023, 11:26pm

I've been working on this problem for a few days now and cannot find a way to solve it.

blockstart	blockend	next_blockstart	ADUserGuid	DispositionId	WorkflowId	StartDate	EndDate
2023-10-01 05:00:00.00	2023-10-01 05:14:59.000	2023-10-01 05:15:00.000	123e4567-e89b-12d3-a456-426614174000	101	1001	2023-10-01 05:00:01.00	2023-10-01 05:14:58.000
2023-10-01 05:15:00.00	2023-10-01 05:29:59.000	2023-10-01 05:30:00.000	123e4567-e89b-12d3-a456-426614174001	102	1002	2023-10-01 05:15:01.00	2023-10-01 05:29:58.000
2023-10-01 05:30:00.00	2023-10-01 05:44:59.000	2023-10-01 05:45:00.000	123e4567-e89b-12d3-a456-426614174002	103	1003	2023-10-01 05:30:01.00	2023-10-01 05:44:58.000
2023-10-01 05:30:00.00	2023-10-01 05:59:59.000	2023-10-01 06:00:00.000	123e4567-e89b-12d3-a456-426614174003	104	1004	2023-10-01 05:30:01.00	2023-10-01 05:59:58.000

Above is a small sample of some of the data I am working with.

Basically, I am working with a full years worth of data that is recording technician productivity. Technicians will begin work on a report that comes in to our system denoted by StartDate, and complete the report denoted by EndDate.
The BlockStart and BlockEnd columns are two columns I have created. The purpose of the columns is to divide a 24 hour day into 15 minute blocks of time so that we can determine "in this quarter hour window, how many reports did a technician complete? How many did they begin? How many did they finish? What was their productivity? How many minutes were they not actively working on reports (i.e. bathroom breaks, lunch, etc.).

A 15 minute block of time translates into 900 seconds (here I am using 899).

There are a couple scenarios here:

A technician starts and finishes a report within one quarter hour block
A technician starts a report in one quarter hour block and finishes in another (later) quarter hour block.

Reports can take anywhere from 5 minutes to over an hour (60 minutes would be 4 quarter hour blocks (4 x 15 = 60 min).

My job is to create a new column and calculate the time it takes a technician to complete a report, this field will be called "ElapsedSeconds" and is the difference in seconds between the StartDate and the EndDate.

However, let's say a technician starts a report at 7:10PM (falls within the 7:00 quarter hour block) and doesn't finish until 7:55PM -> 45 minutes have elapsed starting in the 7:00PM quarter hour block and ending in the 7:45PM quarter hour block. ElapsedSeconds would then be 45 x 60 = 2700 seconds.
BUT! How would I show actual ElapsedSeconds across the multiple quarter hour blocks?

For example, if the report was started at 7:10 in the 7:00 quarter hour block, they would have worked 5 minutes (or 300 seconds) in that block, the 7:15-7:30 blocks would be the full 900 seconds (a technician cannot possibly have more productive seconds than 900 per quarter hour block).

So at the beginning of the 7:15 quarter hour block, we would subtract the 300 seconds that have elapsed from the 2700 total for a difference of 2400 seconds remaining. This remaining time should then be allocated to the next quarter hour block, and any remaining time (time exceeding 900 seconds) should be allocated to the next block, etc.

How can I do this?

Another caveat is that sometimes technicians open multiple reports, in which case they will have multiple elapsed times for each quarter hour block that exceeds 900 seconds, in which case I need to record those and allocate any excess to the next block as with the above.

I have been on ChatGPT, and worked various ways of accomplishing this. My boss has been pressuring me to complete this as I have been working on it all week but I feel like I am at a loss.

Example output:

TechnicianID	StartTime	EndTime	BlockStart	BlockEnd	TotalDurationSeconds	ProductiveSeconds
1	2023-10-01 05:00:00	2023-10-01 05:10:00	2023-10-01 05:00:00	2023-10-01 05:15:00	600	600
1	2023-10-01 05:10:00	2023-10-01 05:25:00	2023-10-01 05:00:00	2023-10-01 05:15:00	900	300
1	2023-10-01 05:15:00	2023-10-01 05:25:00	2023-10-01 05:15:00	2023-10-01 05:30:00	900	600
1	2023-10-01 06:00:00	2023-10-01 06:45:00	2023-10-01 06:00:00	2023-10-01 06:15:00	2700	900
1	2023-10-01 06:15:00	2023-10-01 06:45:00	2023-10-01 06:15:00	2023-10-01 06:30:00	2700	900
1	2023-10-01 06:30:00	2023-10-01 06:45:00	2023-10-01 06:30:00	2023-10-01 06:45:00	2700	900
2	2023-10-01 05:05:00	2023-10-01 05:20:00	2023-10-01 05:00:00	2023-10-01 05:15:00	900	600
2	2023-10-01 05:15:00	2023-10-01 05:20:00	2023-10-01 05:15:00	2023-10-01 05:30:00	900	300
2	2023-10-01 05:30:00	2023-10-01 05:35:00	2023-10-01 05:30:00	2023-10-01 05:45:00	300	300
2	2023-10-01 07:00:00	2023-10-01 07:50:00	2023-10-01 07:00:00	2023-10-01 07:15:00	3000	900
2	2023-10-01 07:15:00	2023-10-01 07:50:00	2023-10-01 07:15:00	2023-10-01 07:30:00	3000	900
2	2023-10-01 07:30:00	2023-10-01 07:50:00	2023-10-01 07:30:00	2023-10-01 07:45:00	3000	900
2	2023-10-01 07:45:00	2023-10-01 07:50:00	2023-10-01 07:45:00	2023-10-01 08:00:00	3000	300
3	2023-10-01 05:45:00	2023-10-01 06:00:00	2023-10-01 05:45:00	2023-10-01 06:00:00	900	900
3	2023-10-01 08:15:00	2023-10-01 09:00:00	2023-10-01 08:15:00	2023-10-01 08:30:00	2700	900
3	2023-10-01 08:30:00	2023-10-01 09:00:00	2023-10-01 08:30:00	2023-10-01 08:45:00	2700	900
3	2023-10-01 08:45:00	2023-10-01 09:00:00	2023-10-01 08:45:00	2023-10-01 09:00:00	2700	900

Something like this (this is the result of one of my attempts)

Except, there would only be one litem per UserId/QuarterHourBlock/DispositionId/WorkflowId

khtan · November 9, 2023, 1:36am

max266, Welcome to SQLTeam.

Can you also show how the expected result looks like?

max266 · November 9, 2023, 2:31am

TechnicianID	StartTime	EndTime	BlockStart	BlockEnd	TotalDurationSeconds	ProductiveSeconds
1	2023-10-01 05:00:00	2023-10-01 05:10:00	2023-10-01 05:00:00	2023-10-01 05:15:00	600	600
1	2023-10-01 05:10:00	2023-10-01 05:25:00	2023-10-01 05:00:00	2023-10-01 05:15:00	900	300
1	2023-10-01 05:15:00	2023-10-01 05:25:00	2023-10-01 05:15:00	2023-10-01 05:30:00	900	600
1	2023-10-01 06:00:00	2023-10-01 06:45:00	2023-10-01 06:00:00	2023-10-01 06:15:00	2700	900
1	2023-10-01 06:15:00	2023-10-01 06:45:00	2023-10-01 06:15:00	2023-10-01 06:30:00	2700	900
1	2023-10-01 06:30:00	2023-10-01 06:45:00	2023-10-01 06:30:00	2023-10-01 06:45:00	2700	900
2	2023-10-01 05:05:00	2023-10-01 05:20:00	2023-10-01 05:00:00	2023-10-01 05:15:00	900	600
2	2023-10-01 05:15:00	2023-10-01 05:20:00	2023-10-01 05:15:00	2023-10-01 05:30:00	900	300
2	2023-10-01 05:30:00	2023-10-01 05:35:00	2023-10-01 05:30:00	2023-10-01 05:45:00	300	300
2	2023-10-01 07:00:00	2023-10-01 07:50:00	2023-10-01 07:00:00	2023-10-01 07:15:00	3000	900
2	2023-10-01 07:15:00	2023-10-01 07:50:00	2023-10-01 07:15:00	2023-10-01 07:30:00	3000	900
2	2023-10-01 07:30:00	2023-10-01 07:50:00	2023-10-01 07:30:00	2023-10-01 07:45:00	3000	900
2	2023-10-01 07:45:00	2023-10-01 07:50:00	2023-10-01 07:45:00	2023-10-01 08:00:00	3000	300
3	2023-10-01 05:45:00	2023-10-01 06:00:00	2023-10-01 05:45:00	2023-10-01 06:00:00	900	900
3	2023-10-01 08:15:00	2023-10-01 09:00:00	2023-10-01 08:15:00	2023-10-01 08:30:00	2700	900
3	2023-10-01 08:30:00	2023-10-01 09:00:00	2023-10-01 08:30:00	2023-10-01 08:45:00	2700	900
3	2023-10-01 08:45:00	2023-10-01 09:00:00	2023-10-01 08:45:00	2023-10-01 09:00:00	2700	900

Something like this (this is the result of one of my attempts)

Except, there would only be one litem per UserId/QuarterHourBlock/DispositionId/WorkflowId

khtan · November 9, 2023, 4:32am

Looks like the data that you have shown is already split into 15 minutes block. Will need the original form of data before splitting into 15 mins block. Else there is no way to know that which rows are related from the same origin

ScottPletcher · November 9, 2023, 4:37am

And the sample data needs to be in the format of a CREATE TABLE and INSERT statement(s) rather than just "splats" on the screen.

khtan · November 9, 2023, 7:16am

I took one step back and work on the solution based on unsplit data.

Assuming that your original data (before splitting) looks like these :-

TechnicianID	StartTime	EndTime	TotalDurationSeconds
1	2023-10-01 05:00:00.000	2023-10-01 05:10:00.000	600
1	2023-10-01 05:10:00.000	2023-10-01 05:25:00.000	900
1	2023-10-01 06:00:00.000	2023-10-01 06:45:00.000	2700
2	2023-10-01 05:05:00.000	2023-10-01 05:20:00.000	900
2	2023-10-01 05:30:00.000	2023-10-01 05:35:00.000	300
2	2023-10-01 07:00:00.000	2023-10-01 07:50:00.000	3000

The query make use of a number / tally table. You may use recursive CTE to generate one on the fly or pre-generate one as a permanent table.

The comments are in the query itself.

-- using recursive CTE to generate a numbers/tally table
-- Or use GENERATE_SERIES (SQL Server 2022+) on the fly
-- For performance, use a permanent numbers table
with numbers as
(
  select n = 0
  union all
  select n = n + 1
  from   numbers 
  where  n < 99 --assumed max 99. Change accordingly
),
cte as
(
select *   
from   tbl t
       cross apply
       (
           -- Start & End Time in 15 mins block HH:00, HH:15, HH:30, HH:45
           select StartB = dateadd(minute, datediff(minute, 0, StartTime) / 15 * 15, 0),
                  EndB   = dateadd(minute, ceiling(datediff(minute, 0, EndTime) / 15.0) * 15, 0)
       ) blk
       cross apply
       (
           -- calculate no of 15 mins block
           select blocks = datediff(minute, StartB, EndB) / 15
       ) b
       inner join numbers n on  n.n >= 1
                            and n.n <= b.blocks
       cross apply
       (
           -- Start & End Time for each block
           select BlockStart = dateadd(minute, (n.n - 1) * 15, StartB),
                  BLockEnd   = dateadd(minute, n.n * 15, StartB)
       ) bt
       cross apply
       ( 
           -- calculate the ratio
           select r = case when b.blocks = 1
                           then 1.0
                           when n.n = 1
                           then datediff(minute, StartTime, BlockEnd) * 1.0 / datediff(minute, StartTime, EndTime)
                           when n.n = b.blocks
                           then datediff(minute, BlockStart, EndTime) * 1.0 / datediff(minute, StartTime, EndTime)
                           else 15.0 / datediff(minute, StartTime, EndTime)
                           end
       ) r
) 
select *, 
       ProductiveSeconds = convert(decimal(10,2), TotalDurationSeconds * r)
from   cte c
order by c.TechnicianId, c.StartTime, c.n
-- option (maxrecursion 0)   -- unmask for more than 100 recursion

db<>fiddle demo

max266 · November 9, 2023, 3:06pm

Thank you - but splitting the time into quarter hour blocks isn't the main concern, it's the ProductiveSeconds portion.

ProductiveSeconds cannot exceed 900 seconds for a quarter hour block, so how to accurately capture a technicians time spent working on a report, and when/if a report is worked across multiple quarter hour blocks (greater than 15 minutes) how to allocate those "overflow" seconds to the next appropriate/sequential quarter hour block?

I spent more time looking at the results you produced - I believe that is close to what I am looking for, but there should only be one row per WorkFlowId, ADUserGuid, DispositionId, and TimeBlock.
For the report of 2700 seconds it is divided into three rows for the same quarter hour block, it should show 900 seconds for the first block, then the remaining seconds for the next sequential quarter hour blocks if that makes sense.

SqlHippo · November 9, 2023, 9:51pm

You had asked this on Reddit. Here is a variation of the reply I gave there. You seemed to be asking for it in 15 minute buckets, not 900 second buckets, but you can adapt this easily enough for that.

max266 · November 9, 2023, 10:26pm

Where do you get the "generate_series" from? I cannot execute the code at that point in SSMS or Data Studio.

But I believe this might be spot on! My boss and I were beginning to think we may need to use a cursor which I understand is frowned upon when working with sets so this gets me excited

SqlHippo · November 9, 2023, 10:38pm

Sorry, I pasted the wrong version. This one explains a little better... generate_series() is in SQL Server 2022, which you probably don't have. The point is that you need a NUMBERS TABLE that has values from 0 up to around 1 Million or so.

SqlHippo · November 9, 2023, 10:39pm

-- Generate a Time Blocks table containing every 15 minutes since 2000-01-01.
-- If you have a numbers table going from 0 to 1million or so, use that and multiply each number by 15.
-- FifteenMinuteBlocks uses generate_series() to create that numbers table on the fly for us.
-- But generate_series() is only available starting with SQL Server 2022.
WITH FifteenMinuteBlocks AS 
(
  SELECT value * 15 AS Fifteens
  FROM generate_series(0, 1000000) t
)
SELECT DATEADD(minute, b.Fifteens, '2000-01-01') AS TimeBlockStart
  , DATEADD(minute, b.Fifteens + 15, '2000-01-01') AS TimeBlockEnd
INTO #TimeBlocks
FROM FifteenMinuteBlocks b
  ;

SqlHippo · November 9, 2023, 10:42pm

After generating that Time Blocks table to create a row for every 15 minute increment, you do the thing that actually solves the problem... mostly you're just joining your Tasks to the TimeBlocks table and then using IIF() to decide which values to start and end with, then you're SUM()ing up the minutes between those start and end dates.

-- Then we SUM the minutes for each tech within the time block.
SELECT 
    t.Technician
  , tb.TimeBlockStart
  , SUM(DATEDIFF(minute
             , IIF(t.StartDate < tb.TimeBlockStart, tb.TimeBlockStart, t.StartDate)
             , IIF(t.EndDate > tb.TimeBlockEnd, tb.TimeBlockEnd, t.EndDate)
            )) AS Minutes
FROM Tasks t
INNER JOIN #TimeBlocks tb
ON (   t.StartDate < tb.TimeBlockEnd
   AND t.EndDate > tb.TimeBlockStart
   )
GROUP BY t.Technician
  , tb.TimeBlockStart
ORDER BY t.Technician
  , tb.TimeBlockStart
  ;

max266 · November 13, 2023, 11:01pm

Thank you for your help SqlHippo!

I have tried using your solution but it isn't quite right - I am looking for productive minutes/seconds for each time block, a technician cannot be more productive in a quarter hour block then actual time in a quarter hour block.

A quarter hour block is a 15 minute period of time, and my results from implementing this gives me minutes that exceed the 15 minute duration.

I think the issue (and the one I have been working to solve) is when a technician works a report that goes beyond 15 minutes or across multiple quarter hour blocks, his/her productive time will simply show as the total time it took them to complete a report based on the TimeBlockStart.

If you look at your code in dbfiddle it also shows this - last code block the second row from your output shows 20 minutes, which should not exist in a quarter hour block. That additional 5 minutes should be "carried" over to the next quarter hour block.

I really appreciate your help, this is not easy and I've struggled with this for a while!

SqlHippo · November 13, 2023, 11:38pm

OK, this part is making sense now...

Another caveat is that sometimes technicians open multiple reports, in which case they will have multiple elapsed times for each quarter hour block that exceeds 900 seconds, in which case I need to record those and allocate any excess to the next block as with the above.

Before, I thought you were saying it should overlap, but you're saying you want all of the accumulated time in excess of 15 minutes / 900 seconds to be appended at the end as if it were not overlapping.

That poses an interesting challenge. I'll have to think about it and see if I can come up with a solution, but off the top of my head, I don't see a way to do it that wouldn't require recursion or a cursor. A problem you can run into very quickly here is that even if you identify the overage and allocate it to the end of an island of coterminous time blocks, you run the risk of overrunning the next island of time.

max266 · November 14, 2023, 12:07am

My boss has developed a workable solution using a cursor, but we still run into issues where the query is not correctly allocating that additional time over 900 seconds in Block A to Block B.

Recursion is just fine to use because our data set is not that large (less than 1 million rows). I have been attempting to utilizing recursion without any success.

Between different forums, GPT, and Stack Overflow, neither of us have been able to find a solution. The alternative would be to use a Python script but really trying to avoid going that route as much as possible!

Again, thank you so much for your assistance - really too kind of you and wish I could give something in return

Ifor · November 14, 2023, 11:12am

I do not understand your question; in future consumable test data and expected results for the test data would help.

Looking at SQLHippo's test data, maybe you just need to get rid of overlapping time periods first.

Using:

CREATE TABLE #Tasks (
    TaskID INT PRIMARY KEY,
    Technician NVARCHAR(50),
    StartDate DATETIME,
    EndDate DATETIME
);
INSERT INTO #Tasks (TaskID, Technician, StartDate, EndDate)
VALUES 
    (1, 'Tech1', '2023-10-30 08:05:00', '2023-10-30 08:25:00'),
    (2, 'Tech1', '2023-10-30 08:20:00', '2023-10-30 08:35:00'),
    (3, 'Tech2', '2023-10-30 08:30:00', '2023-10-30 09:10:00'),
    (4, 'Tech1', '2023-10-30 09:05:00', '2023-10-30 09:45:00'),
    (5, 'Tech3', '2023-10-30 08:10:00', '2023-10-30 08:20:00'),
    (6, 'Tech3', '2023-10-30 08:30:00', '2023-10-30 08:45:00');

this can be done with something like:

WITH Boundaries
AS
(
	SELECT Technician, StartDate, EndDate
		,CASE
			WHEN MAX(EndDate) OVER (PARTITION BY Technician ORDER BY StartDate)
				>= LEAD(StartDate) OVER (PARTITION BY Technician ORDER BY StartDate)
			THEN 0
			WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY Technician ORDER BY StartDate)
			THEN 0
			ELSE 1
		END AS Boundary
	FROM #Tasks
)
,Grps
AS
(
	SELECT Technician, StartDate, EndDate
		,SUM(Boundary) OVER (PARTITION BY Technician ORDER BY StartDate) AS Grp
	FROM Boundaries

)
SELECT Technician
	,MIN(StartDate) AS StartDate
	,MAX(EndDate) AS EndDate
FROM Grps
GROUP BY Technician, Grp;

The result can then be split into 15 minute periods. Putting it all together, and using fnTally instead of GENERATE_SERIES():

WITH Boundaries
AS
(
	SELECT Technician, StartDate, EndDate
		,CASE
			WHEN MAX(EndDate) OVER (PARTITION BY Technician ORDER BY StartDate)
				>= LEAD(StartDate) OVER (PARTITION BY Technician ORDER BY StartDate)
			THEN 0
			WHEN StartDate <= LAG(EndDate) OVER (PARTITION BY Technician ORDER BY StartDate)
			THEN 0
			ELSE 1
		END AS Boundary
	FROM #Tasks
)
,Grps
AS
(
	SELECT Technician, StartDate, EndDate
		,SUM(Boundary) OVER (PARTITION BY Technician ORDER BY StartDate) AS Grp
	FROM Boundaries

)
,WorkTimes
AS
(
	SELECT Technician
		,MIN(StartDate) AS StartDate
		,MAX(EndDate) AS EndDate
	FROM Grps
	GROUP BY Technician, Grp
)
,Period15
AS
(
	/* As an example 15 minute periods for 2023 */
	SELECT DATEADD(minute, N * 15, '2023') AS PeriodStart
		,DATEADD(minute, (N + 1) * 15, '2023') AS PeriodEnd
	FROM dbo.fnTally(0, (DATEDIFF(day, '2023', '2024') * 96) -1)
)
SELECT W.Technician, P.PeriodStart
	,DATEDIFF(second, X.StartDate, X.EndDate) AS SecsInPeriod
FROM WorkTimes W
	JOIN Period15 P
		ON W.StartDate < P.PeriodEnd
			AND W.EndDate > P.PeriodStart
	CROSS APPLY
	(
		VALUES
		(
			IIF(W.StartDate > P.PeriodStart, W.StartDate, P.PeriodStart)
			,IIF(W.EndDate < P.PeriodEnd, W.EndDate, P.PeriodEnd)
		)
	) X (StartDate, EndDate)
ORDER BY Technician, PeriodStart;

jeffw8713 · November 14, 2023, 8:07pm

This may give you some ideas:

DECLARE @Tasks TABLE (
        TaskID INT PRIMARY KEY
      , Technician NVARCHAR(50)
      , StartDate datetime
      , EndDate DATETIME
        );

 INSERT INTO @Tasks (TaskID, Technician, StartDate, EndDate)
 VALUES (1, 'Tech1', '2023-10-30 08:05:00', '2023-10-30 08:25:00')
      , (2, 'Tech1', '2023-10-30 08:20:00', '2023-10-30 08:35:00')
      , (3, 'Tech2', '2023-10-30 08:30:00', '2023-10-30 09:10:00')
      , (4, 'Tech1', '2023-10-30 09:05:00', '2023-10-30 09:45:00')
      , (5, 'Tech3', '2023-10-30 08:10:00', '2023-10-30 08:20:00')
      , (6, 'Tech3', '2023-10-30 08:30:00', '2023-10-30 08:45:00');

DECLARE @min_date    datetime
      , @max_date    datetime
      , @block_start datetime
      , @block_end   datetime;

 SELECT @min_date    = min(t.StartDate)
      , @max_date    = max(t.EndDate)
      , @block_start = dateadd(MINUTE, datediff(MINUTE, 0, min(t.StartDate)) / 15 * 15, 0)
      , @block_end   = dateadd(MINUTE, datediff(MINUTE, 0, max(t.EndDate)) / 15 * 15 + 15, 0)
   FROM @Tasks t;


   WITH time_blocks (block_start, block_end)
     AS (
 SELECT TOP (datediff(MINUTE, @block_start, @block_end) / 15 + 1)
        dateadd(MINUTE, checksum(row_number() OVER(ORDER BY @@spid) - 1) * 15, @block_start)
      , dateadd(MINUTE, checksum(row_number() OVER(ORDER BY @@spid)) * 15, @block_start)
   FROM (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS t1(n)
      , (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) AS t2(n)
        )
      , task_breakout
     AS (
 SELECT *
      , task_start   = iif(tb.block_start < t.StartDate, t.StartDate, tb.block_start)
      , task_end     = iif(tb.block_end > t.EndDate, t.EndDate, tb.block_end)
      , productivity = datediff(SECOND, iif(tb.block_start < t.StartDate, t.StartDate, tb.block_start)
                                      , iif(tb.block_end > t.EndDate, t.EndDate, tb.block_end))
   FROM time_blocks             tb
  INNER JOIN @Tasks              t ON t.StartDate <= tb.block_end
                                  AND t.EndDate >= tb.block_start
        )
 SELECT *
      , total_duration = sum(tb.productivity) OVER(PARTITION BY tb.Technician, tb.TaskID ORDER BY tb.task_start)
   FROM task_breakout           tb
  WHERE tb.Productivity <> 0  
  ORDER BY
        Technician
      , task_start;

You do want to specify the actual columns instead of using '*' - but this is close to what you are looking for - I think.