How to get the latest values day wise from a timeseries table?

siddiqnx · July 30, 2021, 6:11am

I want to get the latest values of each SIZE_TYPE day wise, ordered by TIMESTAMP. So, only 1 value of each SIZE_TYPE must be present for a given day, and that is the latest value for the day.

How do I get the desired output? I'm using PostgreSQL here.

Input

|TIMESTAMP                               |SIZE_TYPE|SIZE|
|----------------------------------------|---------|----|
|1595833641356 [Mon Jul 27 2020 07:07:21]|0        |541 |
|1595833641356 [Mon Jul 27 2020 07:07:21]|1        |743 |
|1595833641356 [Mon Jul 27 2020 07:07:21]|2        |912 |
|1595876841356 [Mon Jul 27 2020 19:07:21]|1        |714 |
|1595876841356 [Mon Jul 27 2020 19:07:21]|2        |987 |
|1595963241356 [Tue Jul 28 2020 19:07:21]|0        |498 |
|1595920041356 [Tue Jul 28 2020 07:07:21]|2        |974 |
|1595920041356 [Tue Jul 28 2020 07:07:21]|0        |512 |

*Note: the TIMESTAMP values are in UNIX time. I have given
the date-time string for reference*

Output

|TIMESTAMP                               |SIZE_TYPE|SIZE|
|----------------------------------------|---------|----|
|1595833641356 [Mon Jul 27 2020 07:07:21]|0        |541 |
|1595876841356 [Mon Jul 27 2020 19:07:21]|1        |714 |
|1595876841356 [Mon Jul 27 2020 19:07:21]|2        |987 |
|1595920041356 [Tue Jul 28 2020 07:07:21]|2        |974 |
|1595963241356 [Tue Jul 28 2020 19:07:21]|0        |498 |

*Note: the TIMESTAMP values are in UNIX time. I have given
the date-time string for reference*

Explanation

For July 27, the latest values for

0: 541 (no other entries for the day)
1: 714
2: 987

For July 28, the latest values for

0: 498
1: nothing (ignore)
2: 974 (no other entries for the day)

I have the query to get the output using distinct_on (which only works on Postgres):

select distinct on (floor(timestamp / (24 * 60 * 60 * 1000)), size_type) t.*
from input
order by floor(timestamp / (24 * 60 * 60 * 1000)), size_type,
          timestamp desc;

But I want it implemented using sub-query or any other technique (that's supported by both Postgres and SQL Server) if it's possible.

yosiasz · July 30, 2021, 12:06pm

Select timestamp, size_type, max(size) as size
From table group by timestamp, size_type

siddiqnx · July 30, 2021, 12:37pm

Thanks for replying but that's not what I wanted.

RogierPronk · July 30, 2021, 1:44pm

I would first select all the size_type and the max timestamp for each day and then select the size, using CTE

WITH LatestValueOfEachDayBySize AS
(
SELECT SIZE_TYPE, CONVERT(date,TS) AS Date, MAX(TS) AS TS
FROM #t
GROUP BY SIZE_TYPE, CONVERT(date,TS)
)
SELECT YourTable.SIZE_TYPE, YourTable.TS, YourTable.Size
FROM #t YourTable
INNER JOIN LatestValueOfEachDayBySize
ON YourTable.SIZE_TYPE=LatestValueOfEachDayBySize.size_type AND YourTable.TS=LatestValueOfEachDayBySize.TS

but I'm sure an other option would be to use ROW_NUMBER.

mike01 · July 30, 2021, 2:56pm

Create table #t (ts datetime, size_type smallint, size int)

insert into #t values 
('Jul 27 2020 07:07:21',0,541),
('Jul 27 2020 07:07:21',1,743),
('Jul 27 2020 07:07:21',2,912),
('Jul 27 2020 19:07:21',1,714),
('Jul 27 2020 19:07:21',2,987),
('Jul 28 2020 19:07:21',0,498),
('Jul 28 2020 07:07:21',2,974),
('Jul 28 2020 07:07:21',0,512)

select TS, SIZE_TYPE, size
 from (
SELECT SIZE_TYPE, size, ts, row_number() over (partition by size_type, Cast(ts as date) order by size_type, ts desc) as RowNum
FROM #t ) t
where RowNum = 1

order by 1,2