Find duplicates

bluepink1116 · February 17, 2023, 9:32pm

I've values in a table which are selected as duplicates if the name is same then the corresponding ids are included in a csv string column as below:

Original table:
create table #original(id int, unique_id varchar(500), name varchar(200))
insert into @original values
(1, '12345', 'A'), (2, '12345', 'A')
, (3, null, 'B'), (4, '45678', 'B')
, (5, '900', 'C'), (6, '901', 'C')
, (7, null, 'D'), (8, null, 'D')
, (9, null, 'E'), (10, '1000', 'E'), (11, null, 'E')
, (12, '1100', 'F'), (13, '1101', 'F), (14, '1102', 'F')

Person with name A, B are same but C isn't because unique id is different for C. A record is duplicate if the name is same AND unique id is there or null. When name is same but the unique ids are different then they aren't the same people.

I'm selecting the data as below: ;with cte as (select name from #original group by name having count(*) > 1)

I need to get the data as below:

C and F should be avoided as their unique_ids are different though names are same.

Thanks

stepson · February 18, 2023, 5:19am

Hi,

I use this portion of code, so that I know, for each row , that should be, in case of the null value on unique_id, the value for this unique_id

SELECT DISTINCT o.name, oa.unique_id,o.id
  FROM #original as o
    CROSS APPLY
    (
         SELECT TOP(1) u.unique_id, u.id
         FROM #original as u
         WHERE u.name = o.name
                 AND u.unique_id is not null
         ORDER BY u.id ASC
    )oa
  WHERE o.unique_id is null

name	unique_id	id
B	45678	3
E	1000	9
E	1000	11

so, for ID = 3, the unique_id should be 45678
for id = 9 , the unique_id should be 1000...

Then the final query , will be something like this:

SELECT
    o.name,
    coalesce(l.unique_id,o.unique_id) as unique_id,
    STRING_AGG(cast(o.id as varchar(50)) ,',') as idList
FROM
#original as o
LEFT JOIN
(
  SELECT DISTINCT o.name, oa.unique_id,o.id
  FROM #original as o
    CROSS APPLY
    (
         SELECT TOP(1) u.unique_id, u.id
         FROM #original as u
         WHERE u.name = o.name
                 AND u.unique_id is not null
         ORDER BY u.id ASC
    )oa
  WHERE o.unique_id is null
)as l
  on o.id=l.id
GROUP BY 
    o.name,
    coalesce(l.unique_id,o.unique_id)
HaVing 
   count(*)> 1
ORDER BY
  o.name

name	unique_id	idList
A	12345	1,2
B	45678	3,4
D	null	7,8
E	1000	9,10,11

dbfiddle here

harishgg1 · January 23, 2024, 10:11am

hi

hope this helps

i was looking for a simple short select statement

there are a lot of data scenarios
close but no cigar

create data script

declare @original table (id int, unique_id varchar(500), name varchar(200))

insert into @original values
(1, '12345', 'A'), (2, '12345', 'A')
, (3, null, 'B'), (4, '45678', 'B')
, (5, '900', 'C'), (6, '901', 'C')
, (7, null, 'D'), (8, null, 'D')
, (9, null, 'E'), (10, '1000', 'E'), (11, null, 'E')
, (12, '1100', 'F'), (13, '1101', 'F'), (14, '1102', 'F')

select 
    cast(a.id as varchar)+','+cast(b.id as varchar)
	, case when a.unique_id is null then b.unique_id else a.unique_id end 
	, a.name 
from 
    @original a join @original b on a.id+1 =b.id and a.name = b.name 
where 
    case when a.unique_id is null then isnull(b.unique_id,1) else a.unique_id end = isnull(b.unique_id,1)