SQL Server 2012 - how to get the latest entries from a column on a table?

mm92280 · August 3, 2020, 9:13pm

So, let's say that I have a table called "res_table" and it has the below columns and data. How would I grab the latest entry for each reservation from this table?

res_number | subres_number | sba_date | room_number
10001 | 1 | 7/30/2020 | 1001
10001 | 1 | 7/31/2020 | 1001
10001 | 2 | 8/1/2020 | 1005
10001 | 2 | 8/2/2020 | 1005
10001 | 3 | 8/3/2020 | 2005
10002 | 1 | 8/1/2020 | 2001
10002 | 2 | 8/2/2020 | 2010
10002 | 2 | 8/3/2020 | 2010
10003 | 1 | 8/2/2020 | 3001
10003 | 1 | 8/3/2020 | 3001
10004 | 1 | 8/3/2020 | 3005

This is what I would want my output to look like - just the last entry on the table for each individual reservation number.

res_number | subres_number | sba_date | room_number
10001 | 3 | 8/3/2020 | 2005
10002 | 2 | 8/3/2020 | 2010
10003 | 1 | 8/3/2020 | 3001
10004 | 1 | 8/3/2020 | 3005

yosiasz · August 3, 2020, 11:59pm

always provide sample data in the following format

use sqlteam
go

create table #corn(res_number int, subres_number int , 
sba_date date, room_number int)

insert into #corn

select 10001 ,	 1 , '7/30/2020' , 1001 union
select 10001 , 1 , '7/31/2020' , 1001 union
select 10001 , 2 , '8/1/2020' , 1005 union
select 10001 , 2 , '8/2/2020' , 1005 union
select 10001 , 3 , '8/3/2020' , 2005 union
select 10002 , 1 , '8/1/2020' , 2001 union
select 10002 , 2 , '8/2/2020' , 2010 union
select 10002 , 2 , '8/3/2020' , 2010 union
select 10003 , 1 , '8/2/2020' , 3001 union
select 10003 , 1 , '8/3/2020' , 3001 union
select 10004 , 1 , '8/3/2020' , 3005

;with src
as
(
select *, ROW_NUMBER() 
OVER(PARTITION BY res_number ORDER BY sba_date desc)  as row_num
 from #corn
 )
 select * From src where row_num = 1
drop table #corn

harishgg1 · August 4, 2020, 7:39am

hi

another way of doing it ... any reason you can think of if it helps .. nice .. great

select 
    top 1 with ties
    *
from 
    data
order by 
   row_number() over (partition by res_number order by sba_date desc)

mike01 · August 4, 2020, 12:37pm

I would check the execution plan on both approaches before choosing one..

yosiasz · August 4, 2020, 1:49pm

good call.

Using a sampling of 100k rows

create table #corn(res_number int, subres_number int , 
sba_date date, room_number int)

insert into #corn
SELECT TOP 100000 ac1.column_id as res_number,
          ABS(CHECKSUM(NEWID())%10000)+10000 subres_number,
         dateadd(dd,ac1.column_id, getdate())   sba_date,
		 ac1.object_id room_number
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2

with cte

with top 1

harishgg1 · August 4, 2020, 4:09pm

hi yosiasz or anyone interested in going in more depth ...

the cost is obviously more for top 1 .. the way you put the data ..

under certain types of data ... conditions
.. top 1 will be better ... will give better COST ... and also other things !!

FYI ( for your information ) for any enthusiasts

yosiasz · August 4, 2020, 4:14pm

I thought the cost was more using the CTE? and top 1 was better? I am not sure, maybe @mike01 can give us more input

harishgg1 · August 4, 2020, 4:26pm

if you look at the cost diagrams .. you put

cost is 50 percent with CTE , cost is 67 percent with TOP 1

jeffw8713 · August 4, 2020, 4:47pm

Cost is irrelevant here - what you need to look at is the actual time and IO stats to determine which one will be better, and it all depends on how it will be used in the overall query.

In some cases - the row number version will perform better - it others the TOP 1 version will perform better. Again, it all depends on how it is incorporated into the overall query and the specific time and IO stats for that query.

mike01 · August 4, 2020, 6:27pm

they were almost identical in IO and time. The only difference I saw was the Top 1 had a second sort. When I ran them together, the first one took 34-36% of the process, while the second was 64-66%

jeffw8713 · August 4, 2020, 7:49pm

Since the IO and time is almost identical - then either method will work. Testing of each version in the overall query will help to determine which is better for that query - at this time.

Just to be clear - I had a fairly complex query where I used the row_number() version to identify the latest rows in a CTE. In fact, in that query I had 2 CTE's and both used the row_number() version...then I modified the code to use the TOP 1 version and saw a significant improvement in performance.

In another case - I had to revert from the TOP 1 version to the row_number() version to improve overall performance.

Either way - the only way to be sure which one is better is to test each one.

yosiasz · August 4, 2020, 7:56pm

surely it must be due to some wind pattern

Wonder if it has anything to do with indexing on the column one is partitioning on and/or sorting on?

JeffModen · August 12, 2020, 2:39pm

Careful Mike... you cannot rely on % of Batch in the execution plan to choose which is the better code even if it's an Actual Execution plan because even Actual Execution plans are rife with estimates. As others have said, only a measure of actual runtime statistics will ultimately identify which is better and you have to be careful even then.

Mateen · September 15, 2020, 4:32pm

If performance is not an issue , then the above code looks fascinating to me and easy to remember type.

harishgg1 · September 15, 2020, 4:35pm

thanks ... mateen !!