Select the newest rows from each group

pkovarik · June 2, 2016, 12:06pm

I have a table T with columns a,b,...,time.
E.g.
New York, 61837, ... , 01/01/2016 20:30
New York, 61837, ... , 01/01/2016 21:30
New York, 61837, ... , 01/01/2016 22:00
New York, 52369, ... , 01/01/2016 20:30
Gotham, 86472, ... , 01/01/2016 16:30
...

I need to create an SQL query that selects rows from a table with a unique a, b, and the latest time.

I.e. in this case:
New York, 61837, ... 01/01/2016 22:00
New York, 52369, ... 01/01/2016 20:30
Gotham, 86472, ... , 01/01/2016 16:30

Can someone please advise?

djj55 · June 2, 2016, 12:27pm

Use SELECT a, b, MAX(time) FROM T GROUP BY a, b;

pkovarik · June 2, 2016, 1:06pm

I need select ALL columns and that doesn't work with GROUP BY a, b.
I know solution now. I must JOIN your SELECT with SELECT whole table on condition a,b,time.
Thank you.

djj55 · June 2, 2016, 1:25pm

There is a couple of ways to do that one waySELECT t.a, t.b, ...., t.time FROM t inner join ( SELECT a, b, MAX(time) FROM t Group by a, b ) as t2 ON t.a = t2.a and t.b = t2.b and t.time = t2.time

ScottPletcher · June 3, 2016, 5:26pm

ROW_NUMBER() is much more efficient and accurate for doing that:

SELECT [any_column_in_the_table] --,...
FROM (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time DESC) AS row_num
    FROM table_name
) AS derived_table
WHERE row_num = 1

Kristen · June 4, 2016, 7:05am

SELECT [any_column_in_the_table] --,...
FROM (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY time DESC) AS row_num

I do this type of statement (all columns in inner-nest used in outer-SELECT) as

SELECT *
FROM (
    SELECT Col1, Col2, ...,

in order that the inner SELECT doesn't grab some column, added in future, which slows the query for some reason.

Is that an unfounded worry? and if there is a fixed column list in the Outer SELECT is that just as efficient?

(Column List in the Outer SELECT avoids including the [row_num] column in the output, which is helpful of course - although in my case the Output is via a template, which includes {TAGS} for all columns to be displayed, so easy enough to "ignore" the [row_num] column, and its pretty skinny in transmission.)

jeffw8713 · June 6, 2016, 5:54pm

You didn't tell us what version of SQL Server - so I will assume 2012 or higher:

SELECT {columns you want here},
LAST_VALUE(t.time) over(PARTITION BY t.a, t.b ORDER BY t.time) As LatestTime
FROM table t