Grouping Adjacent Duplicate rows (But using minimum and aggregating some column)

kalaiselvan_rajendran · June 29, 2018, 4:17pm

Hi,

Got a table with some rows based on each date, Employee, Channel and their work hours. Have to Remove Adjacent duplicates for Col1,Col2 and Col3. Where as Min of Col4 need to be used in the row and Sum of Col5 have to be calculated.
Have to derive the input TableA to the Output mentioned below. (Considering only adjacent rows and not for the whole month)
I had done a recent query to remove the adjacent duplicates - But stucks with the grouping. Still can able to handle it using Cursors or lopping - Is there any other way to use the query to fetch the output. Since my number of rows will be more than lacs.

TableA:
[Col1] [Col2] [Col3] [Col4] [Col5]
M1 CH1 Jun18 1000 60
M1 CH1 Jun18 1060 600
M1 CH1 Jun18 1660 120
M2 CH1 Jun18 1500 240
M2 CH1 Jun18 1740 60
M1 CH1 Jun18 1800 180
M2 CH1 Jun18 1980 60
M2 CH1 Jun18 2040 540
M1 CH1 Jul18 1000 600
M1 CH1 Jul18 1600 60
M2 CH1 Jul18 1660 120
M2 CH1 Jul18 1780 90

Output:
[Col1] [Col2] [Col3] [Col4] [Col5]
M1 CH1 Jun18 1000 780
M2 CH1 Jun18 1500 300
M1 CH1 Jun18 1800 180
M2 CH1 Jun18 1980 600
M1 CH1 Jul18 1000 660
M2 CH1 Jul18 1660 210

bitsmed · June 30, 2018, 10:52am

The logic is not 100% clear to me, so this is a guess from my side:

with cte
  as (select a.col1
            ,a.col2
            ,a.col3
            ,a.col4
            ,a.col5
            ,a.col4+a.col5 as col6
            ,row_number() over(partition by a.col1
                                           ,a.col2
                                           ,a.col3
                                   order by a.col4
                              )
             as rn1
            ,0 as rn2
        from tablea as a
             left outer join tablea as b
                          on b.col1=a.col1
                         and b.col2=a.col2
                         and b.col3=a.col3
                         and b.col4+b.col5=a.col4
       where b.col1 is null
/* this is where you'd add your filter to avoid reading the whole table */
      union all
      select a.col1
            ,a.col2
            ,a.col3
            ,a.col4
            ,a.col5+isnull(b.col5,0) as col5
            ,a.col6+isnull(b.col5,0) as col6
            ,a.rn1
            ,a.rn2+1 as rn2
        from cte as a
             inner join tablea as b
                          on b.col1=a.col1
                         and b.col2=a.col2
                         and b.col3=a.col3
                         and b.col4=a.col6
     )
select top(1) with ties
       col1
      ,col2
      ,col3
      ,col4
      ,col5
  from cte
 order by row_number() over(partition by col1
                                        ,col2
                                        ,col3
                                        ,rn1
                                order by rn2 desc
                           )
;

femiolan · June 30, 2018, 7:37pm

I think you are going to need an identity column in order to determine the adjacent rows. That said, this is my best guess:

Declare @test as table
(
id int identity(1,1),
Col1  char(2),
Col2  char(3),
Col3  date,
Col4  numeric(10,0),
Col5  numeric(10,0)


)

insert @test 
values
('M1','CH1','6/18/2018','1000','60'),
('M1','CH1','6/18/2018','1060','600'),
('M1','CH1','6/18/2018','1660','120'),
('M2','CH1','6/18/2018','1500','240'),
('M2','CH1','6/18/2018','1740','60'),
('M1','CH1','6/18/2018','1800','180'),
('M2','CH1','6/18/2018','1980','60'),
('M2','CH1','6/18/2018','2040','540'),
('M1','CH1','7/18/2018','1000','600'),
('M1','CH1','7/18/2018','1600','60'),
('M2','CH1','7/18/2018','1660','120'),
('M2','CH1','7/18/2018','1780','90');


with tst
as
(

select  
a.Col1 
,a.Col2 
,a.Col3 
,a.Col4 
,a.col5
,a.id cnt
 from @test a 



)

,
cte as
(
select a.* 

, a.Col5 [sum]

from tst a
where a.cnt = 1

union all

select 
c.Col1 
,c.Col2 
,c.Col3 
,c.Col4 
,c.Col5 
--,
--(
--case 
--when c.Col1 = b.Col1 and c.Col2 = b.Col2 and c.Col3 = b.Col3 
--then b.cnt + 1
--else 1
--end
--)

,
(
case 
when c.Col1 = b.Col1 and c.Col2 = b.Col2 and c.Col3 = b.Col3 
then b.cnt 
else b.cnt + 1
end
)

,

(
case 
when c.Col1 = b.Col1 and c.Col2 = b.Col2 and c.Col3 = b.Col3 
then cast(b.sum + c.Col5 As numeric(10,0))
else c.Col5
end
)


from 


cte b


,
tst c
, tst d
--, tst a
where 

--next element
c.cnt = d.cnt + 1
and (b.Col1 = d.Col1 and b.Col2 = d.Col2 and b.Col3 = d.Col3 and b.Col4 = d.Col4 and d.Col5 = d.Col5 )
--and d.cnt <> b.cnt
and c.cnt > b.cnt   

)

select a.Col1 ,a.Col2,a.Col3, MIN(a.col4) [Col4], MAX(a.sum) [Col5] 
from cte a
group by a.Col1 ,a.Col2,a.Col3, a.cnt 
order by a.cnt asc

Hope this helps