Find duplicates

I've values in a table which are selected as duplicates if the name is same then the corresponding ids are included in a csv string column as below:

Original table:
create table #original(id int, unique_id varchar(500), name varchar(200))
insert into @original values
(1, '12345', 'A'), (2, '12345', 'A')
, (3, null, 'B'), (4, '45678', 'B')
, (5, '900', 'C'), (6, '901', 'C')
, (7, null, 'D'), (8, null, 'D')
, (9, null, 'E'), (10, '1000', 'E'), (11, null, 'E')
, (12, '1100', 'F'), (13, '1101', 'F), (14, '1102', 'F')

Person with name A, B are same but C isn't because unique id is different for C. A record is duplicate if the name is same AND unique id is there or null. When name is same but the unique ids are different then they aren't the same people.

I'm selecting the data as below: ;with cte as (select name from #original group by name having count(*) > 1)

I need to get the data as below:
image

C and F should be avoided as their unique_ids are different though names are same.

Thanks

Hi,

I use this portion of code, so that I know, for each row , that should be, in case of the null value on unique_id, the value for this unique_id

SELECT DISTINCT o.name, oa.unique_id,o.id
  FROM #original as o
    CROSS APPLY
    (
         SELECT TOP(1) u.unique_id, u.id
         FROM #original as u
         WHERE u.name = o.name
                 AND u.unique_id is not null
         ORDER BY u.id ASC
    )oa
  WHERE o.unique_id is null
name unique_id id
B 45678 3
E 1000 9
E 1000 11

so, for ID = 3, the unique_id should be 45678
for id = 9 , the unique_id should be 1000...

Then the final query , will be something like this:

SELECT
    o.name,
    coalesce(l.unique_id,o.unique_id) as unique_id,
    STRING_AGG(cast(o.id as varchar(50)) ,',') as idList
FROM
#original as o
LEFT JOIN
(
  SELECT DISTINCT o.name, oa.unique_id,o.id
  FROM #original as o
    CROSS APPLY
    (
         SELECT TOP(1) u.unique_id, u.id
         FROM #original as u
         WHERE u.name = o.name
                 AND u.unique_id is not null
         ORDER BY u.id ASC
    )oa
  WHERE o.unique_id is null
)as l
  on o.id=l.id
GROUP BY 
    o.name,
    coalesce(l.unique_id,o.unique_id)
HaVing 
   count(*)> 1
ORDER BY
  o.name
name unique_id idList
A 12345 1,2
B 45678 3,4
D null 7,8
E 1000 9,10,11

dbfiddle here

1 Like

hi

hope this helps

i was looking for a simple short select statement

there are a lot of data scenarios
close but no cigar :thinking: :yum:

create data script

declare @original table (id int, unique_id varchar(500), name varchar(200))

insert into @original values
(1, '12345', 'A'), (2, '12345', 'A')
, (3, null, 'B'), (4, '45678', 'B')
, (5, '900', 'C'), (6, '901', 'C')
, (7, null, 'D'), (8, null, 'D')
, (9, null, 'E'), (10, '1000', 'E'), (11, null, 'E')
, (12, '1100', 'F'), (13, '1101', 'F'), (14, '1102', 'F')

select 
    cast(a.id as varchar)+','+cast(b.id as varchar)
	, case when a.unique_id is null then b.unique_id else a.unique_id end 
	, a.name 
from 
    @original a join @original b on a.id+1 =b.id and a.name = b.name 
where 
    case when a.unique_id is null then isnull(b.unique_id,1) else a.unique_id end = isnull(b.unique_id,1)