Get latest record duolicate ID the date difference is within 4 days

Hi,

I'm having a problem on how to get the records if the duplicate ID the date difference is 4 days and get the latest ID. any help is very much appreciated. see below sample DDL as reference and desired result.

Declare  @sample table
(id nvarchar (50), dt datetime, itemid varchar(35)
)
insert @sample values ('ABCDE5B12AG123AA','2017-07-29 05:03:00.000','SAL00A')
insert @sample values ('ABCDE5B12AG123AA','2017-07-02 15:42:00.000','SAL00A')
insert @sample values ('ABCDE5B12AG123TB','2017-10-25 02:03:00.000','SAL001')
insert @sample values ('ABCDE5B12AG123TB','2017-10-29 04:42:00.000','SAL001')
insert @sample values ('ABCDE5B12AG12345','2017-02-27 20:26:00.000','SAL001')
insert @sample values ('ABCDE5B12AG12345','2017-02-28 00:01:00.000','SAL001')
insert @sample values ('ABCDE5B12AG33333','2017-02-02 04:01:00.000','BOI002')
insert @sample values ('ABCDE5B12AG11111','2017-11-26 10:03:09.000','TOP222')
insert @sample values ('ABCDE5B12AG11111','2017-10-25 02:03:00.000','TOP222')
insert @sample values ('ABCDE5B12AG11111','2017-10-29 04:42:00.000','TOP222')

select * from @sample


result
id------------------dt----------------------itemid
ABCDE5B12AG123AA----2017-07-29 05:03:00.000--SAL00A
ABCDE5B12AG123AA----2017-07-02 15:42:00.000--SAL00A
ABCDE5B12AG123TB----2017-10-29 04:42:00.000--SAL001
ABCDE5B12AG12345----2017-02-28 00:01:00.000--SAL001
ABCDE5B12AG33333----2017-02-02 04:01:00.000--BOI002
ABCDE5B12AG11111----2017-11-26 10:03:09.000--TOP222
ABCDE5B12AG11111----2017-10-29 04:42:00.000--TOP222

I'm pretty sure this will work, although it will require two scans of the table. Hopefully that's not a problem for you. I don't have time right now to try to figure out how to do it with only one scan of the table.

;WITH cte_sample AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt) AS row_num
    FROM @sample
)
SELECT cs.*
FROM cte_sample cs
WHERE NOT EXISTS(SELECT 1 FROM cte_sample cs2 WHERE cs2.id = cs.id AND cs2.row_num = cs.row_num + 1 AND ABS(DATEDIFF(DAY, cs.dt, cs2.dt)) <= 4)
ORDER BY id
1 Like

Thank you Scott. I will try this code to scan more or less 10m if the performance is manageable.