SQLTeam.com | Weblogs | Forums

Need suggestion: How to get average, min and max of a column over fixed time duration

sql2008

#1

Hi all,

I have a table with a column named "create_time" (datetime, null) and there's another column "ID".
What I am trying to do is get count(ID) with an interval of 60 minutes.
If CURRENT_TIMESTAMP is say 2015-08-05 17:01:29.657, then I want to start from 2015-08-05 17:00:00.000 and take count(ID) when "create_time" is between 2015-08-05 17:00:00.000 and 2015-08-05 16:00:00.000.

I could come up with a query for this:

select State, count(ID) as Calls_per_hour 
from CTIRPTS.dbo.cti_reporting WHERE Create_time 
BETWEEN DATEADD(hour,-1,DATEADD(hour,DATEDIFF(hour,0,GETDATE()),0))
AND DATEADD(hour,DATEDIFF(hour,0,GETDATE()),0)
group by State;

Now I want to do this for each hour (going in past time) until 2015-08-05 00:00:00.000 and calculate

  1. average of count(ID) (sum of all count(ID) until 2015-08-05 00:00:00.000 divided by number of times we took count(ID))
  2. minimum of count(ID) and maximum of count(ID)

I will have to do this for present day to get avg, min and max for the day and might have to find avg, min and max for last week as well.

Can you please suggest me how I should approach to get this done. After getting the above query, I am simply unable to think of a way to get this done and some direction will be much appreciated.

Thanks in advance!


#2

I am finding it hard to understand your descriptions. Some representative sample data, DDL for tables (that someone can copy to an SSMS window) and your expected results would make it much more easier for someone who wants to respond.

Based on my limited understanding, if you want to get the counts per hour for the current date, your query can be something like this. I am using the spt_values table, but if you have a numbers table in your database, it would be better to use that.

If you want hourly averages, totals etc. for the entire day, the output of this can then be used to compute those.

SELECT
	[State],
	DATEADD(hh,number,CAST(GETDATE() AS DATE)) AS StartHour,
	COUNT(*) AS CallsPerHour
FROM
	master..spt_values s
	LEFT JOIN CTIRPTS.dbo.cti_reporting c ON 
		DATEADD(hh,number,CAST(GETDATE() AS DATE)) >= Create_time
		AND DATEADD(hh,number+1,CAST(GETDATE() AS DATE)) < Create_time
WHERE
	s.[Type] = 'P' AND s.number < 24
GROUP BY
	[State],
	DATEADD(hh,number,CAST(GETDATE() AS DATE))
ORDER BY
	DATEADD(hh,number,CAST(GETDATE() AS DATE));

#3

Hi JamesK,

Thank you for your response and my apologies for not being very clear. As this is my first attempt towards DB queries and only source I am using for now is internet. :cry:

You got it right though. I just changed DATE to DATETIME from your query (as it was giving error: Type DATE is not a defined system type.) and it started without any error but didn't complete even after 30 seconds, so I stopped the query. The table I am using has 1474037 rows. Would you suggest me to run the query again and let it take time until it completes?

Also I looked in the system databases and I don't see numbers (dbo.spt_numbers) table there.
Following are the details I could provide, just in case if you or anyone can help me further using it
Here is the table structure with some sample data:

pkey		Home_Telephone	ID						EDU_ID								City	State	Zip	Menu_Selection		Account_Number	Billing_System	Prin	FTA		Node	Company	Division	Email_Addr				Create_time				system	MAC_ID			Service_Address
123456789	7778889999		00250087661438113227	5mm7mmm000000000mm10emm623300002	AAAAAAAA	MS	31234	Tier1			1234567891234567	C			5500	0050	EC6664	0		0			abc@xyz.com				2015-07-28 14:53:52.000	8381	NULL			123 MAIN DR  
123455789	2225558888		00250068021438191419	55n90nn000000000nn10ennn23300002	BBBBBBBB	MS	35678	NULL			8542687456248952	C			6600	0040	MA6610	0		0			hgf@itf.com 			2015-07-29 12:37:04.000	8381	NULL			456 CENTER RD  

Again, my bad but I didn't understand "DDL for tables (that someone can copy to an SSMS window)"

And what I am expecting in output is like:

State	time_window											Calls_per_hour	Avg_calls_per_hour Min 		Max
GA		2015-08-05 00:00:00.000 and 2015-08-05 01:00:00.000		150				150/60			150		150
AL		2015-08-05 00:00:00.000 and 2015-08-05 01:00:00.000		120				120/60			120		120
MO		2015-08-05 00:00:00.000 and 2015-08-05 01:00:00.000		145				145/60			145		145

State	time_window											Calls_per_hour	Avg_calls_per_hour Min 		Max
GA		2015-08-05 01:00:00.000 and 2015-08-05 02:00:00.000		250				250/60			150		250
AL		2015-08-05 01:00:00.000 and 2015-08-05 02:00:00.000		220				220/60			120		220
MO		2015-08-05 01:00:00.000 and 2015-08-05 02:00:00.000		245				245/60			145		245

Something like this for present day. If CURRENT_TIMESTAMP is say 2015-08-05 17:01:29.657, then until 2015-08-05 17:00:00.000


#4

DATE is a data type that is available only in SQL 2008 or later. So I am guessing that you are using SQL 2005 (or earlier). If that is the case, change all instances of "CAST(GETDATE() AS DATE)" to "DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)".

The reason your code ran for a long time could be that you changed CAST(GETDATE() AS DATE) to CAST(GETDATE() AS DATETIME). That would not be logically correct for what we are trying to accomplish.

master..spt_values is a table in the master database. Using the syntax master..spt_values instructs SQL Server to look specifically in the master database for that table.

DDL is acronym for data definition language. For example, CREATE TABLE dbo.MyTable(id int) etc.


#5

Thanks again JamesK.
I was impatient and ran the first query as you had posted in first reply and let it take time to complete.
It completed in 44 seconds but as you mentioned, it seems like something went wrong due to DATETIME and it fetched following result:

State	StartHour	CallsPerHour
NULL	2015-08-06 10:58:30.950	1
NULL	2015-08-06 11:58:30.950	1
NULL	2015-08-06 12:58:30.950	1
NULL	2015-08-06 13:58:30.950	1
NULL	2015-08-06 14:58:30.950	1
NULL	2015-08-06 15:58:30.950	1
NULL	2015-08-06 16:58:30.950	1
NULL	2015-08-06 17:58:30.950	1
NULL	2015-08-06 18:58:30.950	1
NULL	2015-08-06 19:58:30.950	1
NULL	2015-08-06 20:58:30.950	1
NULL	2015-08-06 21:58:30.950	1
NULL	2015-08-06 22:58:30.950	1
NULL	2015-08-06 23:58:30.950	1
NULL	2015-08-07 00:58:30.950	1
NULL	2015-08-07 01:58:30.950	1
NULL	2015-08-07 02:58:30.950	1
NULL	2015-08-07 03:58:30.950	1
NULL	2015-08-07 04:58:30.950	1
NULL	2015-08-07 05:58:30.950	1
NULL	2015-08-07 06:58:30.950	1
NULL	2015-08-07 07:58:30.950	1
NULL	2015-08-07 08:58:30.950	1
NULL	2015-08-07 09:58:30.950	1

After reading your second response, I tried the same query after modifying it per your suggestions and it still gave almost same output after running for 50 seconds.
This is the query I tried:

SELECT
	[State],
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) AS StartHour,
	COUNT(*) AS CallsPerHour
FROM
	master..spt_values s
	LEFT JOIN CTIRPTS.dbo.cti_reporting c ON 
		DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) >= Create_time
		AND DATEADD(hh,number+1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) < Create_time
WHERE
	s.[Type] = 'P' AND s.number < 24
GROUP BY
	[State],
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))
ORDER BY
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0));

Output I got in 50 seconds:

State	StartHour	CallsPerHour
NULL	2015-08-06 00:00:00.000	1
NULL	2015-08-06 01:00:00.000	1
NULL	2015-08-06 02:00:00.000	1
NULL	2015-08-06 03:00:00.000	1
NULL	2015-08-06 04:00:00.000	1
NULL	2015-08-06 05:00:00.000	1
NULL	2015-08-06 06:00:00.000	1
NULL	2015-08-06 07:00:00.000	1
NULL	2015-08-06 08:00:00.000	1
NULL	2015-08-06 09:00:00.000	1
NULL	2015-08-06 10:00:00.000	1
NULL	2015-08-06 11:00:00.000	1
NULL	2015-08-06 12:00:00.000	1
NULL	2015-08-06 13:00:00.000	1
NULL	2015-08-06 14:00:00.000	1
NULL	2015-08-06 15:00:00.000	1
NULL	2015-08-06 16:00:00.000	1
NULL	2015-08-06 17:00:00.000	1
NULL	2015-08-06 18:00:00.000	1
NULL	2015-08-06 19:00:00.000	1
NULL	2015-08-06 20:00:00.000	1
NULL	2015-08-06 21:00:00.000	1
NULL	2015-08-06 22:00:00.000	1
NULL	2015-08-06 23:00:00.000	1

Output of "sp_help cti_reporting;"

Column_name	Type	Computed	Length	Prec	Scale	Nullable	TrimTrailingBlanks	FixedLenNullInSource	Collation
pkey	int	no	4	10   	0    	no	(n/a)	(n/a)	NULL
Home_Telephone	nvarchar	no	20	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
ID	nvarchar	no	42	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
EDU_ID	nvarchar	no	64	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
City	nvarchar	no	100	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
State	nvarchar	no	4	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Zip	nvarchar	no	10	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Menu_Selection	nvarchar	no	40	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Account_Number	nvarchar	no	32	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Billing_System	nvarchar	no	4	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Prin	nvarchar	no	8	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
FTA	nvarchar	no	8	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Node	nvarchar	no	12	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Company	nvarchar	no	4	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Division	nvarchar	no	4	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Email_Addr	nvarchar	no	162	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Create_time	datetime	no	8	     	     	yes	(n/a)	(n/a)	NULL
system	nvarchar	no	8	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
MAC_ID	nvarchar	no	40	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS
Service_Address	nvarchar	no	120	     	     	yes	(n/a)	(n/a)	SQL_Latin1_General_CP1_CI_AS

I hope this gives more detailed information.


#6

Do you have only one record in each hour? Run this query and see what you get? It is trying to see all the records between midnight and 1:00 AM on August 6, 2015.

SELECT * FROM CTIRPTS.dbo.cti_reporting
WHERE
	Create_time >= '2015-08-06 00:00:00.000'
	AND Create_time < '2015-08-06 01:00:00.000';

#7

No, there are many records for each hour.

SELECT * FROM CTIRPTS.dbo.cti_reporting
WHERE
	Create_time >= '2015-08-06 00:00:00.000'
	AND Create_time < '2015-08-06 01:00:00.000';

Above query fetches 1107 rows.


#8

See what each of these queries give you

-- 1
SELECT 	
	COUNT(ID)
FROM
	CTIRPTS.dbo.cti_reporting
WHERE
	Create_time >= '2015-08-06 00:00:00.000'
	AND Create_time < '2015-08-06 01:00:00.000';	

-- 2
SELECT 	
	DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0),
	COUNT(ID)
FROM
	CTIRPTS.dbo.cti_reporting
WHERE
	Create_time >= '2015-08-06 00:00:00.000'
	AND Create_time < '2015-08-06 01:00:00.000'
GROUP BY
	DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0);

-- 3
SELECT 	
	DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0),
	COUNT(ID)
FROM
	CTIRPTS.dbo.cti_reporting
WHERE
	Create_time >= DATEADD(hh,0,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))
	AND Create_time < DATEADD(hh,1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))
GROUP BY
	DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0);

#9

--1
Returned 1107 rows

--2
Gives following error:
Each GROUP BY expression must contain at least one column that is not an outer reference.

--3
Gives following error:
Each GROUP BY expression must contain at least one column that is not an outer reference.

I tried adding State after select and after group by as well but still get same error.


#10

My bad. The group by had only constant expressions. What does this give?

SELECT
	number,
	DATEADD(hh,number-1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) AS StartHour,
	COUNT(*) AS CallsPerHour
FROM
	master..spt_values s
	LEFT JOIN CTIRPTS.dbo.cti_reporting c ON 
		DATEADD(hh,number-1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) >= Create_time
		AND DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) < Create_time
WHERE
	s.[Type] = 'P' AND s.number = 1
GROUP BY
	number,
	DATEADD(hh,number-1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0));

#11

It gives this one row:
number StartHour CallsPerHour
1 2015-08-06 00:00:00.000 1


#12

Pffft! My fault. This is the problem with writing queries without testing. The join condition was exactly the opposite of what it should have been. See the >= and < have been changed compared to previous.

SELECT
	[State],
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) AS StartHour,
	COUNT(ID) AS CallsPerHour
FROM
	master..spt_values s
	LEFT JOIN CTIRPTS.dbo.cti_reporting c ON 
		DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) <= Create_time
		AND DATEADD(hh,number+1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) > Create_time
WHERE
	s.[Type] = 'P' AND s.number < 24
GROUP BY
	[State],
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))
ORDER BY
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0));

#13

By the way, when I said about posting DDL's, what I meant was something like the CREATE statement and INSERT statements in the following - which then someone can copy and write a query. Without that, either someone has to create that and write the query, or write the query hoping that it would be correct.

CREATE TABLE #cti_reporting
(
	create_time DATETIME, 
	ID INT
);

INSERT INTO #cti_reporting
        ( create_time, ID )
VALUES  
	( '2015-08-06 00:34:24', 1  ),
	( '2015-08-06 00:22:12', 2  ),
	( '2015-08-06 00:54:11', 3  ),
	( '2015-08-06 01:24:05', 4  );
	
SELECT
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) AS StartHour,
	COUNT(ID) AS CallsPerHour
FROM
	master..spt_values s
	LEFT JOIN #cti_reporting c ON 
		DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) <= Create_time
		AND DATEADD(hh,number+1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) > Create_time
WHERE
	s.[Type] = 'P' AND s.number < 24
GROUP BY
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))
ORDER BY
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0));
	
DROP TABLE 	#cti_reporting;

#14

Thanks a lot JamesK. That worked perfect!
It displayed 407 rows with State wise calls per hour for each hour startting 2015-08-06 00:00:00.000 until 2015-08-06 23:00:00.000.
CallsPerHour column had '1' for times in future (starting 2015-08-06 15:00:00.000 until 2015-08-06 23:00:00.000)

About the DDL I was not sure how to find create query for this table.
Please let me know if I should post entire result here.
Subset of output is:

State	StartHour	CallsPerHour
NULL	2015-08-06 00:00:00.000	376
MI	2015-08-06 00:00:00.000	165
WI	2015-08-06 00:00:00.000	13
VA	2015-08-06 00:00:00.000	1
WY	2015-08-06 00:00:00.000	10
TX	2015-08-06 00:00:00.000	19
SC	2015-08-06 00:00:00.000	11
NC	2015-08-06 00:00:00.000	14
MN	2015-08-06 00:00:00.000	10
NV	2015-08-06 00:00:00.000	20
MO	2015-08-06 00:00:00.000	130
AL	2015-08-06 00:00:00.000	21
CA	2015-08-06 00:00:00.000	142
IL	2015-08-06 00:00:00.000	23
LA	2015-08-06 00:00:00.000	9
OR	2015-08-06 00:00:00.000	25
TN	2015-08-06 00:00:00.000	16
CO	2015-08-06 00:00:00.000	23
NE	2015-08-06 00:00:00.000	7
MA	2015-08-06 00:00:00.000	6
WA	2015-08-06 00:00:00.000	24
GA	2015-08-06 00:00:00.000	16
MT	2015-08-06 00:00:00.000	25
NY	2015-08-06 00:00:00.000	1
WA	2015-08-06 01:00:00.000	6
TX	2015-08-06 01:00:00.000	11
WY	2015-08-06 01:00:00.000	17
CO	2015-08-06 01:00:00.000	7
MI	2015-08-06 01:00:00.000	28
MS	2015-08-06 01:00:00.000	2
VA	2015-08-06 01:00:00.000	3
WI	2015-08-06 01:00:00.000	31
NULL	2015-08-06 01:00:00.000	219
MO	2015-08-06 01:00:00.000	51

Edit: Somehow I was not able to post reply here, so got delayed but tried it immediately after your post :slight_smile:


#15

If you are getting 1 for future hours, make sure that you have COUNT(ID) AS CallsPerHour rather than COUNT(*) as CallsPerhour.

If you don't want future hours, change the WHERE clause to

s.[Type] = 'P' AND number <= DATEPART(HOUR,GETDATE())

If the query is working properly, no need to post the DDL.


#16

These tips will help a lot. I changed count(*) to count(ID) and now get 0 instead of 1 for future timeframes which is good.
I have one thing that I didn't understand though: Why is there one row for each timeframe where state is NULL. I tried to find if there are any entries where state is NULL/blank but didn't fnd any using following three queries:

select * from CTIRPTS.dbo.cti_reporting where State = NULL;
select * from CTIRPTS.dbo.cti_reporting where State = 'NULL';
select * from CTIRPTS.dbo.cti_reporting where State = '';

#17

It is not because there are rows where state is null or empty, but because of the way we are querying. When there are no records at all for a given time slot, the query still produces a row because we are doing a left join to the numbers table. You can see this in the first query below.

If you want to eliminate those rows, the second query will do that. If you want to get a row for each state and a row where there was no data for that state, you will also need a table with the states listed.

CREATE TABLE #cti_reporting
(
	[state] char(2),
	create_time DATETIME, 
	ID INT
);

INSERT INTO #cti_reporting
        ( [state],create_time, ID )
VALUES  
	('NY', '2015-08-06 00:34:24', 1  ),
	('NY', '2015-08-06 00:22:12', 2  ),
	('NY', '2015-08-06 00:54:11', 3  ),
	('NY', '2015-08-06 01:24:05', 4  );

-- Query #1
SELECT
	s.number, c.*
FROM
	master..spt_values s
	LEFT JOIN #cti_reporting c ON 
		DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) <= Create_time
		AND DATEADD(hh,number+1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) > Create_time
WHERE
	s.[Type] = 'P' AND s.number < 24
ORDER BY 
	s.number,c.create_time

-- Query #2.  To remove rows with zeros, uncomment either the where clause 	[state] IS NOT NULL 
-- or the two lines in the having clause
SELECT
	[state],
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) AS StartHour,
	COUNT(ID) AS CallsPerHour
FROM
	master..spt_values s
	LEFT JOIN #cti_reporting c ON 
		DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) <= Create_time
		AND DATEADD(hh,number+1,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0)) > Create_time
WHERE
	s.[Type] = 'P' AND s.number < 24
	-- AND [state] IS NOT NULL 
GROUP BY
	[state],
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0))
--HAVING
--	COUNT(ID) > 0
ORDER BY
	DATEADD(hh,number,DATEADD(dd,DATEDIFF(dd,0,GETDATE()),0));	
	
DROP TABLE 	#cti_reporting;

#18

This is awesone. I can't thank you enough JamesK.
This was my first encounter on querying and putting something on SQL Team forum and both have been too fruitful.
Thanks to you!