Help with this group statement needed

barnabeck · October 29, 2018, 8:25pm

I have a log-table with these record:

DateAndTime         Group
------------------------------
2018-10-26 08:00    GroupA
2018-10-26 08:10    GroupA
2018-10-27 08:05    GroupB
2018-10-28 08:05    GroupB
2018-10-28 10:00    GroupB
2018-10-29 08:20    GroupA

I'm looking for a query that returns the time frame in which the user had been logged in different groups. Like:

From               To                 GroupA    Logons
------------------------------------------------------
2018-10-26 08:00   2018-10-26 08:10   GroupA    2
2018-10-27 08:05   2018-10-28 10:00   GroupB    3
2018-10-29 08:20   2018-10-29 08:20   GroupA    1

I tried:

Select Min(DateAndTime) as From, Max(DateAndTime) as To, Group, Count(Group)
From Table
Group by Group

But this doesn't only work as the query would return the global min(), max() values for all records with a single record for each group.

Who has the right hint?
Martin

bitsmed · October 29, 2018, 8:48pm

Something like this perhaps:

with cte
  as (select dateandtime
            ,[group]
            ,row_number() over(order by dateandtime)
            -row_number() over(order by [group]) as rn
        from log_table
     )
select min(dateandtime) as [from]
      ,max(dateandtime) as [to]
      ,[group]
      ,count(*) as logons
  from cte
 group by [group]
         ,rn
 order by [from]
;

barnabeck · October 29, 2018, 10:38pm

looks pretty good, but I can't see the logic why

over(order by dateandtime) - row_number() over(order by [group])

returns the same and unique value for all members of the same timeframe. I can understand that the values really are the same, but are they unique, too??? It would be a really cool and elegant way to build an unique indicator for members of the same bundle. Is this reliable? I'm puzzling over the logic.

instinctivly I would expect (at least):

over(order by dateandtime) - row_number() over(order by [group], dateandtime)

Martin

harishgg1 · October 30, 2018, 7:39am

hi bitsmed

like this solution

i have never seen this

,row_number() over(order by dateandtime)
-row_number() over(order by [group]) as rn

i am sure it will help me somewhere

Always happy to learn new things

bitsmed · October 30, 2018, 4:11pm

I belive you're right - should have added dateandtime to the "order by" in the last "over".
As to if it is reliable I think it is in this situation, but it's up to you to test on a larger scale of data. I would be very careful as to where I'd use this technique.

harishgg1 · November 3, 2018, 7:42am

Hi bitsmed

An alternative way using recursive CTE

drop create data .. I added row number

use  tempdb 
go 


drop table data 
go 


create table data 
(
rn int identity(1,1) not null,
DateAndTime    datetime ,
Group1  varchar(100) 
)
go 

insert into data SELECT '2018-10-26 08:00','GroupA'
insert into data SELECT '2018-10-26 08:10','GroupA'
insert into data SELECT '2018-10-27 08:05','GroupB'
insert into data SELECT '2018-10-28 08:05','GroupB'
insert into data SELECT '2018-10-28 10:00','GroupB'
insert into data SELECT '2018-10-29 08:20','GroupA'
go 

select * from data
go

SQL recursive CTE

; WITH reccte 
     AS (SELECT *, 
                1 AS grp 
         FROM   data 
         WHERE  rn = 1 
         UNION ALL 
         SELECT a.*, 
                CASE 
                  WHEN a.group1 = b.group1 THEN b.grp + 1 
                  WHEN a.group1 <> b.group1 THEN 1 
                END AS grp 
         FROM   data a 
                JOIN reccte b 
                  ON a.rn = b.rn + 1) 
SELECT * 
FROM   reccte 

go

Result

bitsmed · November 3, 2018, 11:35am

I tend to avoid recursive CTE whenever I can, so here's a couple of alternatives:

with cte
  as (select dateandtime as [from]
            ,lead(dateandtime,1,'9999-12-31 23:59:59.998') over(order by dateandtime) as [to]
        from (select top(1) with ties
                     dateandtime
                from log_table
               order by case when [group]=lag([group],1,null) over(order by dateandtime) then 1 else 0 end
             ) as a
     )
select a.[from]
      ,max(b.dateandtime) as [to]
      ,min([group]) as [group]
      ,count(*) as logons
  from cte as a
       inner join log_table as b
               on b.dateandtime>=a.[from]
              and b.dateandtime<a.[to]
 group by a.[from]
;

with cte1
  as (select top(1) with ties
             dateandtime
        from log_table
       order by case when [group]=lag([group],1,null) over(order by dateandtime) then 1 else 0 end
     )
    ,cte2
  as (select a.dateandtime
            ,a.[group]
            ,sum(1) as group_num
        from log_table as a
             left outer join cte1 as b
                          on b.dateandtime<=a.dateandtime
       group by a.dateandtime
               ,a.[group]
     )
select min(dateandtime) as [from]
      ,max(dateandtime) as [to]
      ,min([group]) as [group]
      ,count(*) as logons
  from cte2
 group by group_num
;

harishgg1 · November 3, 2018, 12:43pm

thanks bitsmed

any particular reason why you tend to avoid recursive CTEs

I will be aware any time I use recursive CTE's

Thanks

bitsmed · November 3, 2018, 6:38pm

Mostly for performance reasons, but also for readability and to avoid infinite loops. Actually I convinced myself to avoid cursors for the same reasons.

Ps.: I'm not saying anyone else should avoid recursive cte's and cursors as they were invented for a reason, it's just a little "tick" of mine

harishgg1 · November 3, 2018, 7:19pm

Thanks bitsmed for your feedback
Got your reason