Using FIRST_VALUE with repeating groups of rows

sqlslick · July 19, 2016, 9:09pm

Hello SQL Gods:

Consider the following dataset:

ID | Station | DateIn | DateOut
10 | A1 | 2016-04-01 | 2016-04-01
10 | A1 | 2016-04-02 | 2016-04-02
10 | A1 | 2016-04-03 | 2016-04-03
10 | B1 | 2016-04-04 | 2016-04-04
10 | B1 | 2016-04-05 | 2016-04-05
10 | A1 | 2016-04-06 | 2016-04-06
10 | A1 | 2016-04-07 | 2016-04-07

I need to know the first DateIn for each station block. The trick is the station might repeat. I tried the the following but it only got me half way there:

SELECT
ID
,Station
,FIRST_VALUE(DateIn) OVER (PARTITION BY ID, Station ORDER BY DateIn) FirstDateIn
FROM
...

The code above gives me the following:

ID | Station | FirstDateIn
10 | A1 | 2016-04-01
10 | B1 | 2016-04-04
10 | A1 | 2016-04-01

But, I need it to be like this:

ID | Station | FirstDateIn
10 | A1 | 2016-04-01
10 | B1 | 2016-04-04
10 | A1 | 2016-04-06

The last row in the final dataset above has the FirstDateIn as 2016-04-06 because Station A1 repeats again in the main dataset.

Anybody know how to achieve this? I hope I have explained it properly.

Thank you!!!

bitsmed · July 19, 2016, 9:22pm

Try this:

with cte
  as (select id
            ,stations
            ,datein
            ,row_number() over(order by datein) as rn
        from yourtable
     )
select a.id
      ,a.stations
      ,a.datein
  from cte as a
       left outer join cte as b
                    on b.id=a.id
                   and b.stations=a.stations
                   and b.rn=a.rn-1
 where b.id is null
;

sqlslick · July 20, 2016, 1:40pm

Thanks for the suggestion, but this just gives me the first row:

ID | Station | FirstDateIn
10 | A1 | 2016-04-01

JamesK · July 20, 2016, 2:49pm

This is the class of problems usually referred to as "Islands and Gaps". You will find several solutions if you search online. The most efficient method perhaps is to use the row_number function to split the rows into groups and then take one row from each group.

The query below is untested, so it might not exactly do what you want. If it does not, look at the subquery and see what the Grp column is giving you and adjust it if necessary. Alternatively, post create table and insert statements to create consumable sample data, so someone can copy and run the queries against it.

SELECT Id, station,
	MIN(DateIn) AS FirstDateIn
FROM
(
	SELECT
		*,
		ROW_NUMBER() OVER (PARTITION BY Id ORDER BY datein)
		- ROW_NUMBER() OVER (PARTITION BY Id, Station ORDER BY datein) AS Grp
	FROM
		YourTable
) s
GROUP BY 
	Id, Station, Grp

bitsmed · July 20, 2016, 3:06pm

Using your sample data, I get:

id  stations  datein
10  A1        2016-04-01
10  B1        2016-04-04
10  A1        2016-04-06

jeffw8713 · July 21, 2016, 6:18pm

Here is a different approach:

Declare @testTable Table (ID int, Station char(2), DateIn date, DateOut date);

 Insert Into @testTable
 Select *
   From (
 Values (10, 'A1', '2016-04-01', '2016-04-01')
      , (10, 'A1', '2016-04-02', '2016-04-02')
      , (10, 'A1', '2016-04-03', '2016-04-03')
      , (10, 'B1', '2016-04-04', '2016-04-04')
      , (10, 'B1', '2016-04-05', '2016-04-05')
      , (10, 'A1', '2016-04-06', '2016-04-06')
      , (10, 'A1', '2016-04-07', '2016-04-07')
        ) As i(ID, Station, DateIn, DateOut);

   With startDates
     As (
 Select *
      , Case When t.Station <> coalesce(lag(t.Station) over(Partition By t.ID
                                                                Order By t.DateIn), '')
             Then 1
             Else 0
         End As StartDate
   From @testTable       t
        )
 Select *
   From startDates
  Where StartDate = 1;