I have a table, Students, with the following columns:
| id | name | class | date_registrered |
I want to select one row for every unique class, and only the row with the largest value in date_registrered,
i.e. I want to select the latest registrered Student, including all the data for that one.
I tried: SELECT id, name, class, MAX(date_registrered) FROM Students GROUP BY class;
I get the following error:
Column 'Students.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
With your code you would need to add id and name into the group by. Or you could remove id and name from the query.
Please give example data as instert into
SELECT id, name, class, date_registrered
FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY class ORDER BY date_registered DESC) AS row_num
FROM Students
)
WHERE row_num = 1
ORDER BY class /*ORDER BY is optional, of course*/
Yes, from what I've seen, the optimizer will only actually pull the column(s) used in the outer query. That's why I use the * -- it makes the code easier to maintain without any harm to performance.
For example, I just ran the query below, and it took only 4 I/Os to return the result:
SELECT nonclus_key_col
FROM (
SELECT *, ROW_NUMBER() OVER(ORDER BY nonclus_key_col) AS row_num
FROM dbo.table_with_400_million_rows
) AS derived
WHERE row_num = 1
Yeah, testing that is easy to see too. Running the whole thing works but running the inner query results in a divide by zero error. Clearly it's not even evaluating all the columns
DECLARE @TableExample TABLE
(
Id INT IDENTITY(1,1) PRIMARY KEY,
OtherValue VARCHAR(16) NOT NULL,
AnotherValue VARCHAR(32) NULL
)
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 3);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 5);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 7);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(1, 9);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(2, 3);
INSERT INTO @TableExample (OtherValue, AnotherValue) VALUES(2, 5);
SELECT OtherValue
FROM (
SELECT *, ROW_NUMBER() OVER(ORDER BY OtherValue) AS row_num, 1/0 As DivideByZero
FROM @TableExample
) AS derived
WHERE row_num = 1;
Perfect, thanks. I've been doing it the other-way-round:
SELECT derived.*
FROM (
SELECT nonclus_key_col, ROW_NUMBER() OVER(ORDER BY nonclus_key_col) AS row_num
FROM dbo.table_with_400_million_rows
) AS derived
WHERE row_num = 1
but the "snag" with this approach is that I also get row_num in the output, and most often it is not required (i.e. only required for the WHERE row_num = 1 filter)