SQLTeam.com | Weblogs | Forums

Find missing data using calendar table for multiple clients


#1

I'm trying to come up with a way to audit a sales table to determine if there are missing records on any days by client. Most of the sample queries I looked at don't work once you try to find gaps in the data by client. Below are 3 sample tables and when I run the query at the end, I get back
2015-07-04
2015-07-06

but Client 200 didn't send any data on 2015-07-02 so that should be on the list, but because client 300 did, it doesn't show up as an issue.

The other problem I have is since I'm looking for clients with missing data and there's no record for that day, how do I return the client ID's for the gaps that are missing? I can return the missing days, but I want the associated client so I know who to get a hold of.

Essentially I'd love to return something this and show each client for each day (commas are just for show)
2015-07-02, 200
2015-07-04, 200
2015-07-04, 300
2015-07-06, 200
2015-07-06, 300

DECLARE @Calendar TABLE(CalendarDate date primary key)
INSERT INTO @Calendar
VALUES('20150701'),('20150702'),('20150703'),('20150704'),('20150705'),('20150706'),('20150707')

DECLARE @Sales TABLE(ID int identity(1,1) primary key, ClientID integer, SaleDate date)
INSERT INTO @Sales
VALUES(300,'20150701'),(300,'20150702'),(300,'20150703'),(300,'20150705'),(300,'20150707'),
(200,'20150701'),(200,'20150701'),(200,'20150703'),(200,'20150705'),(200,'20150707')
/(The file frequency on the @ClientInfo table is so I can hopefully add a GETDATE() - Frequency to handle clients who send files monthly vs daily)/
DECLARE @ClientInfo TABLE(ID int identity(100,100) primary key, Name varchar(50), FileFrequency int NOT NULL DEFAULT 0)
INSERT INTO @ClientInfo
VALUES('ALPHA', 0),
('BETA', 1),
('GAMMA', 30)

select CalendarDate
from @Calendar C LEFT OUTER JOIN
(SELECT ClientID, SaleDate, COUNT(*) SALECOUNT FROM @Sales
GROUP BY ClientID, SaleDate
) S on C.CalendarDate = S.SaleDate
WHERE S.SALECOUNT IS NULL


#2

It would be helpful if you have a clients table for example like this:

DECLARE @Clients TABLE (ClientId int);
INSERT INTO @Clients VALUES (200),(300);

If you don't have that, you can select distinct clients from @Sales and use that. In any case, once you know the list of clients, the query can be:

SELECT
	a.CalendarDate,
	b.ClientId
FROM
	@Calendar a
	CROSS JOIN @Clients b 
WHERE NOT EXISTS
	(SELECT * FROM @Sales s
	WHERE s.ClientId = b.ClientId
	AND s.SaleDate = a.CalendarDate);

If you just want the dates, instead of change the select to

SELECT DISTINCT a.CalendarDate FROM....

#3

@JamesK The CROSS JOIN worked well for a one week, but as I increased the time in the query, it was taking forever to run.

I checked the execution plan and added a new index to one of the tables on ClientID and SaleDate and it runs sub-seconds now.

Thanks for the help.