Need help with a GROUP BY query

zoo0ega · November 16, 2016, 3:25pm

I have a table, Students, with the following columns:

| id | name | class | date_registrered |

I want to select one row for every unique class, and only the row with the largest value in date_registrered,
i.e. I want to select the latest registrered Student, including all the data for that one.

I tried:
SELECT id, name, class, MAX(date_registrered) FROM Students GROUP BY class;

I get the following error:

Column 'Students.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Does anyone know how to fix this?

djj55 · November 16, 2016, 4:14pm

With your code you would need to add id and name into the group by. Or you could remove id and name from the query.
Please give example data as instert into

ScottPletcher · November 16, 2016, 5:59pm

SELECT id, name, class, date_registrered
FROM (
    SELECT *, 
       ROW_NUMBER() OVER(PARTITION BY class ORDER BY date_registered DESC) AS row_num
    FROM Students
)
WHERE row_num = 1
ORDER BY class /*ORDER BY is optional, of course*/

Kristen · November 16, 2016, 6:17pm

Just curious:

Is SELECT * (in the Inner SELECT) expensive" in this scenario, or is SQL smart enough to only consider the columns actually used in the Outer SELECT ?

(I appreciate that you may have used it, here, just "for example", but if it is indeed "cheap" then I will adopt that style )

ScottPletcher · November 16, 2016, 8:04pm

Yes, from what I've seen, the optimizer will only actually pull the column(s) used in the outer query. That's why I use the * -- it makes the code easier to maintain without any harm to performance.

For example, I just ran the query below, and it took only 4 I/Os to return the result:

SELECT nonclus_key_col
FROM (
    SELECT *, ROW_NUMBER() OVER(ORDER BY nonclus_key_col) AS row_num
    FROM dbo.table_with_400_million_rows
) AS derived
WHERE row_num = 1

James · November 16, 2016, 10:19pm

Yeah, testing that is easy to see too. Running the whole thing works but running the inner query results in a divide by zero error. Clearly it's not even evaluating all the columns

DECLARE @TableExample TABLE
(
Id INT IDENTITY(1,1) PRIMARY KEY,
OtherValue VARCHAR(16) NOT NULL,
AnotherValue VARCHAR(32) NULL
)
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 3);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 5);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 7);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 9);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(2, 3);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(2, 5);

SELECT OtherValue
FROM (
SELECT *, ROW_NUMBER() OVER(ORDER BY OtherValue) AS row_num, 1/0 As DivideByZero
FROM @TableExample
) AS derived
WHERE row_num = 1;

Kristen · November 17, 2016, 7:05am

Perfect, thanks. I've been doing it the other-way-round:

SELECT derived.*
FROM (
    SELECT nonclus_key_col, ROW_NUMBER() OVER(ORDER BY nonclus_key_col) AS row_num
    FROM dbo.table_with_400_million_rows
) AS derived
WHERE row_num = 1

but the "snag" with this approach is that I also get row_num in the output, and most often it is not required (i.e. only required for the WHERE row_num = 1 filter)