Get all rows duplicates by a column

absolut30 · March 12, 2020, 10:52am

Hi,
I need a query to get rows with duplicate values in a columnn and different in other. For instance, if I have:

definition
bat     job  ....   .....
Hi      apple
Hi      babana
Hi      orange
Bye     apple
Bye     apple
Home  apple
Home  apple
Home  banana

I need obtain:

bat     job
Hi       apple,banana,orange
Home  apple,apple,banana

I try with these query but don't works

	select a.bat, a.job, b.job from definition a inner join definition b on
	( a.bat=b.bat and
	  a.job <> b.job
	  and a.bat in (select bat from definition group by bat having count(*)>1));

Please any help? Thanks

JamesK · March 12, 2020, 12:39pm

-- would make it much easier to reply if you post consumable sample
-- data like shown below
DROP TABLE IF EXISTS #tmp;
CREATE TABLE #tmp(colA VARCHAR(32), colB VARCHAR(32));
INSERT INTO #tmp VALUES
('a','x'),
('a','y'),
('a','z'),
('a','x'),
('b','1'),
('b','2');

-- the query can be something like this.
SELECT
    t1.colA,
    STUFF(t2.col2,1,1,'') AS ColBList
FROM
    (SELECT a.ColA FROM #tmp AS a GROUP BY a.ColA ) AS t1
    CROSS APPLY
    (
        SELECT ','+b.ColB 
        FROM #tmp AS b
        WHERE b.colA = t1.colA
        FOR XML PATH('')
    ) AS t2(col2);

JamesK · March 12, 2020, 12:41pm

To eliminate rows that have no duplicates, change the first subquery to this

(SELECT a.ColA FROM #tmp AS a GROUP BY a.ColA 
HAVING COUNT(DISTINCT a.ColB) > 1) AS t1

absolut30 · March 12, 2020, 2:29pm

Thanks for your effort!
While you gave me the solution, I'm trying to apply these logic for get a successful output. I generate two temp tables, one with all rows with one column duplicate and other with all rows with two columns duplicate, then I do a join to only obtain rows that they appears in one table (left explicit join). I'm trying with these sentence but it not gives me the expected result, do you know why? Thanks!!!

with duplicate as 
(
select distinct a.job,a.bat from definition a where job in (
 		SELECT job
		FROM definition
		GROUP BY job
		HAVING COUNT(*) > 1) order by job desc
),
duplicatetwofields as 
(
SELECT job, bat
	 FROM definition
GROUP BY job, bat
HAVING COUNT(*) > 1 order by job desc
) 
select a.job,a.bat from duplicate a left join  duplicatetwofields b
on (a.job=b.job and 
a.bat=b.bat) 
where a.bat is null or a.job is null;

yosiasz · March 12, 2020, 3:22pm

another option

use sqlteam
go

if OBJECT_ID('tempdb..#absolut30') IS NOT NULL
	drop table #absolut30;

create table #absolut30(bat nvarchar(50), job nvarchar(50))

insert into #absolut30
select 'Hi',  'hi-apple' union all
select 'Hi',  'hi-babana' union all
select 'Hi',  'hi-orange' union all
select 'Bye',  'bye-apple' union all
select 'Bye',  'bye-apple' union all
select 'Home',  'casa-apple' union all
select 'Home',  'casa-apple' union all
select 'Home',  'casa-banana' 


select distinct
     bat
   ,
    stuff((
        select ',' + u.job
        from #absolut30 u
        where u.bat = a.bat
        order by u.job
        for xml path('')
    ),1,1,'') as userlist
from #absolut30 a
group by bat