Overlapping Dates Exclusion

Hello all,

This is possibly a rather simple Question, but
nevertheless a challenge for me. I have a table with 4 fields Person_id,
Store_id, startdate and enddate. For a particular value for ‘person_id', there
can a number of records with different start and end dates. Here I need to find
the time gap of record if it is more than 24 hours. The issue here is for each
person_id there may records with overlapping periods.

Example 1:

Person_id
Store_ID
Startdate
enddate

10000351067
10000232561
2010-04-08 2010-08-06

10000351067
10000232561
2016-09-09 2016-09-16

10000351067
10000232561
2016-09-16 2016-10-03

10000351067
10000232561
2016-10-03 2016-10-07

10000351067
10000232561
2016-10-07 2017-01-17

10000351067
10000232561
2017-01-17 2018-04-05

10000351067
10000232561
2017-06-16 2017-06-20

10000351067
10000232561 2018-04-05
NULL

Example 2:

10000193858
10000225875
2016-07-13 2016-08-03

10000193858
10000225875
2016-08-03 2017-05-17

10000193858
10000225875
2017-05-17 2017-06-05

10000193858
10000225875 2017-05-31
2017-06-05

10000193858
10000225875
2017-06-05 2017-06-13

10000193858
10000225875
2017-06-13 2017-08-16

10000193858
10000225875
2017-08-07 2017-08-16

10000193858
10000225875 2017-08-16
2017-08-18

10000193858
10000225875
2017-08-18 2017-08-31

10000193858
10000225875
2017-08-31 2018-01-05

10000193858
10000225875
2017-11-13 2017-11-20

10000193858 10000225875
2018-01-05 NULL

I tried by using a below query, but was unsuccessful (Possibly I
have done something wrong).

IF OBJECT_ID ('tempdb.dbo.#ordered_removal_list
') IS NOT NULL

DROP TABLE #ordered_removal_list

CREATE TABLE
#ordered_removal_list(

[ID_New] [int] IDENTITY(1,1) NOT NULL,

[person_id] [bigint] NULL,

[Store_ID]
[bigint] NULL,

[started_at] [datetime] NULL,

[ended_at] [datetime] NULL,

)

INSERT INTO
#ordered_removal_list

(person_id,Store_ID,started_at,ended_at)

SELECTperson_id,Store_ID,started_at,ended_at
into #test FROM Temp_Data

;WITH cte

AS

(

SELECT
ord1.person_id, ord1.started_at,
ord1.ended_at, next1.started_at as next1_start,

Last1.started_at
as last1_start, CASE WHEN DATEDIFF (HOUR,
last1.ended_at, ord1.started_at) > 23 THEN 'GAP' ELSE 'NO_GAP' END as 'gap'

FROM
#test ord1

LEFT JOIN
CTE_1 next1 on next1.[ID_New] = ord1.[ID_New]

  • 1 and ord1.person_id = next1.person_id)

SELECT * FROM cte

where gap
= 'GAP'

I was unable to exclude the overlapping dates which are marked in
red color in above examples.Any suggestions (preferably code examples)

Thanks!

Please provide:

  • table definition in the form of create statement
  • sample data in the form of insert statement
  • expected output from the sample data you provide

Based on your sample data - here is one solution:

Declare @PersonStore Table (Person_ID bigint, Store_ID bigint, Started_At date, Ended_At date);

 Insert Into @PersonStore (Person_ID, Store_ID, Started_At, Ended_At)
 Values (10000351067, 10000232561, '2010-04-08', '2010-08-06')
      , (10000351067, 10000232561, '2016-09-09', '2016-09-16')
      , (10000351067, 10000232561, '2016-09-16', '2016-10-03')
      , (10000351067, 10000232561, '2016-10-03', '2016-10-07')
      , (10000351067, 10000232561, '2016-10-07', '2017-01-17')
      , (10000351067, 10000232561, '2017-01-17', '2018-04-05')
      , (10000351067, 10000232561, '2017-06-16', '2017-06-20')
      , (10000351067, 10000232561, '2018-04-05', NULL)
      , (10000193858, 10000225875, '2016-07-13', '2016-08-03')
      , (10000193858, 10000225875, '2016-08-03', '2017-05-17')
      , (10000193858, 10000225875, '2017-05-17', '2017-06-05')
      , (10000193858, 10000225875, '2017-05-31', '2017-06-05')
      , (10000193858, 10000225875, '2017-06-05', '2017-06-13')
      , (10000193858, 10000225875, '2017-06-13', '2017-08-16')
      , (10000193858, 10000225875, '2017-08-07', '2017-08-16')
      , (10000193858, 10000225875, '2017-08-16', '2017-08-18')
      , (10000193858, 10000225875, '2017-08-18', '2017-08-31')
      , (10000193858, 10000225875, '2017-08-31', '2018-01-05')
      , (10000193858, 10000225875, '2017-11-13', '2017-11-20')
      , (10000193858, 10000225875, '2018-01-05', Null);

   With startDates
     As (
 Select *
      , NextStarted_At = lead(ps.Started_At, 1) over(Partition By ps.Person_ID
                                                                , ps.Store_ID 
                                                         Order By ps.Started_At
                                                                , ps.Ended_At)
   From @PersonStore    ps
        )
 Select *
      , GapInDays = iif(sd.NextStarted_At > sd.Ended_At, datediff(day, sd.Ended_At, sd.NextStarted_At), 0)
   From startDates      sd
  Order By
        sd.Person_ID
      , sd.Store_ID
      , sd.Started_At
      , sd.Ended_At;

This can be simplified to show overlaps and gaps...overlaps are negative values and gaps are positive values - 0 = no overlap/gap

 Select *
      , GapInDays = datediff(day, ps.Ended_At, lead(ps.Started_At, 1) over(Partition By ps.Person_ID
                                                                                      , ps.Store_ID
                                                                               Order By ps.Started_At
                                                                                      , ps.Ended_At))
   From @PersonStore    ps
  Order By
        ps.Person_ID
      , ps.Store_ID
      , ps.Started_At
      , ps.Ended_At;

I changed your datetime's to dates since all of your values were date only. If you actually have datetimes and they have time components and you need to identify gaps based on hours - you can adjust the datediff to use hours instead of days.

Based on what I am seeing - it appears that you are looking to remove overlapped rows. If so - then you will need to add additional logic to determine that the 'next' row in the overlap actually includes all dates from the current row.

If the next logical row has a start date that is prior to the current rows end date - but has an end date that is greater than the current rows end date - you cannot remove that row because then you will be missing those dates in the final result.

Thanks for responding. Yes! That correct, I wanted to exclude over lap rows if 'next' row or before row in the overlap actually includes all dates from the current row or earlier row.

Currently I'm using Microsoft SQL server version 2008 in which the lead function that doesn't work.

LEAD/LAG can be accomplished using CROSS APPLY with TOP 1.