SQLTeam.com | Weblogs | Forums

Recursive Function to Group Records with consecutive dates?

sql2008

#1

Hey All,

I am having a bit of challenge...I have a data set which resembles the first table below. I am attempting to group records where ID1 and ID2 are the same and where date ranges that are consecutive and/or fall within the consecutive range are grouped. All of the relevant data elements (financials etc) would be summed/grouped as well. I have spent a considerable amount of time on this and I'm quite stumped. Any help that you could provide would be great!

ID1 is always the same for each series of dates. ID2 can be different. The table below represents one example of ID1

Thanks!

Here is a sample data set:

ID1 ID2 Admit Date Discharge Date
0001 000 7/23/2013 7/25/2013
0001 001 9/20/2013 9/21/2013
0001 001 9/21/2013 10/11/2013
0001 001 10/11/2013 11/15/2016
0001 001 10/16/2013 10/18/2013 this one runs concurrently or 'falls within' the consecutive range
0001 001 11/15/2013 12/04/2013
0001 001 12/4/2013 12/09/2013
0001 001 12/9/2013 12/18/2013
0001 001 12/22/2013 12/23/2013

What it should look like when the query is ran:

ID1 ID2 Admit Date Discharge Date
0001 000 7/23/2013 7/25/2013
0001 001 9/20/2013 12/18/2013
0001 001 12/22/2013 12/23/2013


#2

With SQL2012, and above, you can probably use the LEAD/LAG functions to do this more efficently.
As you seem to be using SQL2008 you can either use a loop or, using a set based approach, expand the dates and then roll them up again.

-- *** Consumable Test Data ***
-- Please supply in future with dates in ISO format.
CREATE TABLE #t
(
    ID1 char(4) NOT NULL
    ,ID2 char(3) NOT NULL
    ,AdmitDate date NOT NULL
    ,DischargeDate date NOT NULL
);
INSERT INTO #t
VALUES ('0001','000','20130723', '20130725')
    ,('0001','001','20130920', '20130921')
    ,('0001','001','20130921', '20131011')
    ,('0001','001','20131011', '20131115') -- presume not 11/15/2016!
    ,('0001','001','20131016', '20131018')
    ,('0001','001','20131115', '20131204')
    ,('0001','001','20131204', '20131209')
    ,('0001','001','20131209', '20131218')
    ,('0001','001','20131222', '20131223');

WITH Numbers(n)
AS
(
    -- 10000 rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,InDates
AS
(
    SELECT DISTINCT T.ID1, T.ID2, A.InDate
    FROM #t T
        JOIN Numbers N
            ON N.N <= DATEDIFF(day, T.AdmitDate, T.DischargeDate) + 1
        CROSS APPLY (VALUES( DATEADD(day, N -1, T.AdmitDate)) ) A(InDate)
    --order by ID1, ID2, InDate
)
,AllDates
AS
(
    -- Could use a calendar table here
    SELECT A.TheDate
        ,ROW_NUMBER() OVER (ORDER BY A.TheDate) AS rn
    FROM Numbers N
        CROSS JOIN
        (
            SELECT MIN(InDate) AS StartDate, MAX(InDate) AS EndDate
            FROM InDates D
        ) X
        CROSS APPLY (VALUES( DATEADD(day, N -1, X.StartDate)) ) A(TheDate)
    WHERE A.TheDate <= X.EndDate
)
,Grps
AS
(
    SELECT I.ID1, I.ID2, I.InDate
        ,A.rn - ROW_NUMBER() OVER (PARTITION BY ID1, ID2 ORDER BY InDate) AS Grp
    FROM AllDates A
        JOIN InDates I
            ON A.TheDate = I.InDate
)
,Results
AS
(
    SELECT ID1, ID2, Grp
        ,MIN(InDate) AS AdmitDate
        ,MAX(InDate) AS DischargeDate
    FROM Grps
    GROUP BY ID1, ID2, Grp
)
SELECT ID1, ID2, AdmitDate, DischargeDate
FROM Results
ORDER BY ID1, ID2, AdmitDate;

#3

Wow, impressive! It worked for the 4 fields that I had in the original sample data, but as I added in additional fields it started to separate portions of the records again. If I am adding in fields, what would need to be altered to allow for the additional fields but maintain the grouping?

Below I identify where I added the 2 fields... the bolded "ADDED HERE". These 2 fields are dates but I am trying to add other items also.

Any thoughts on the matter would be great!

Thanks!

CREATE TABLE #t
(
ID1 char(4) NOT NULL
,ID2 char(3) NOT NULL
,AdmitDate date NOT NULL
,DischargeDate date NOT NULL

ADDED HERE

);
INSERT INTO #t
VALUES ('0001','000','20130723', '20130725')
,('0001','001','20130920', '20130921')
,('0001','001','20130921', '20131011')
,('0001','001','20131011', '20131115') -- presume not 11/15/2016!
,('0001','001','20131016', '20131018')
,('0001','001','20131115', '20131204')
,('0001','001','20131204', '20131209')
,('0001','001','20131209', '20131218')
,('0001','001','20131222', '20131223');
WITH Numbers(n)
AS
(
-- 10000 rows
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,InDates
AS
(
SELECT DISTINCT T.ID1, T.ID2, ADDED HERE A.InDate
FROM #t T
JOIN Numbers N
ON N.N <= DATEDIFF(day, T.AdmitDate, T.DischargeDate) + 1
CROSS APPLY (VALUES( DATEADD(day, N -1, T.AdmitDate)) ) A(InDate)
--order by ID1, ID2, InDate
)
,AllDates
AS
(
-- Could use a calendar table here
SELECT A.TheDate
,ROW_NUMBER() OVER (ORDER BY A.TheDate) AS rn
FROM Numbers N
CROSS JOIN
(
SELECT MIN(InDate) AS StartDate, MAX(InDate) AS EndDate
FROM InDates D
) X
CROSS APPLY (VALUES( DATEADD(day, N -1, X.StartDate)) ) A(TheDate)
WHERE A.TheDate <= X.EndDate
)
,Grps
AS
(
SELECT I.ID1, I.ID2, ADDED HERE I.InDate
,A.rn - ROW_NUMBER() OVER (PARTITION BY ID1, ID2 ORDER BY InDate) AS Grp
FROM AllDates A
JOIN InDates I
ON A.TheDate = I.InDate
)
,Results
AS
(
SELECT ID1, ID2, ADDED HERE Grp
,MIN(InDate) AS AdmitDate
,MAX(InDate) AS DischargeDate
FROM Grps
GROUP BY ID1, ID2, ADDED HERE Grp
)
SELECT ID1, ID2, ADDED HERE AdmitDate, DischargeDate
FROM Results
ORDER BY ID1, ID2, AdmitDate;


#4

Also, as I am working through the query I follow most of what is going on but what is the function of the Numbers CTE?


#5

The Numbers CTE is just an inline Number or Tally table. You can google this.
On looking at the code again, the AllDates CTE can be replaced with a base date. (I have used 01/01/2010)

WITH Numbers(n)
AS
(
    -- 10000 rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,InDates
AS
(
    SELECT DISTINCT T.ID1, T.ID2, A.InDate
    FROM #t T
        JOIN Numbers N
            ON N.N <= DATEDIFF(day, T.AdmitDate, T.DischargeDate) + 1
        CROSS APPLY (VALUES( DATEADD(day, N -1, T.AdmitDate)) ) A(InDate)
    --order by ID1, ID2, InDate
)
,Grps
AS
(
    SELECT ID1, ID2, InDate
        ,DATEDIFF(day, '20100101', InDate)
			- ROW_NUMBER() OVER (PARTITION BY ID1, ID2 ORDER BY InDate) AS Grp
    FROM InDates
)
,Results
AS
(
    SELECT ID1, ID2, Grp
        ,MIN(InDate) AS AdmitDate
        ,MAX(InDate) AS DischargeDate
    FROM Grps
    GROUP BY ID1, ID2, Grp
)
SELECT ID1, ID2, AdmitDate, DischargeDate
FROM Results
ORDER BY ID1, ID2, AdmitDate;

To understand what is going on, just work your way through the CTE''s

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
WITH Numbers(n)
AS
(
    -- 10000 rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
--,InDates
--AS
--(
    SELECT DISTINCT T.ID1, T.ID2, A.InDate
    FROM #t T
        JOIN Numbers N
            ON N.N <= DATEDIFF(day, T.AdmitDate, T.DischargeDate) + 1
        CROSS APPLY (VALUES( DATEADD(day, N -1, T.AdmitDate)) ) A(InDate)
    order by ID1, ID2, InDate
WITH Numbers(n)
AS
(
    -- 10000 rows
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n)
)
,InDates
AS
(
    SELECT DISTINCT T.ID1, T.ID2, A.InDate
    FROM #t T
        JOIN Numbers N
            ON N.N <= DATEDIFF(day, T.AdmitDate, T.DischargeDate) + 1
        CROSS APPLY (VALUES( DATEADD(day, N -1, T.AdmitDate)) ) A(InDate)
    --order by ID1, ID2, InDate
)
--,Grps
--AS
--(
    SELECT ID1, ID2, InDate
        ,DATEDIFF(day, '20100101', InDate)
			- ROW_NUMBER() OVER (PARTITION BY ID1, ID2 ORDER BY InDate) AS Grp
		,DATEDIFF(day, '20100101', InDate)
		,ROW_NUMBER() OVER (PARTITION BY ID1, ID2 ORDER BY InDate)
    FROM InDates
	order by ID1, ID2, InDate

etc


#6

Thanks, I will look that up. Any suggestions for the post concerning adding additional fields? More than just the 4 in the original post?

I posted about it earlier in the thread. Thanks again for all the help!!