Remove duplicate

Hi Guys,

I want to remove duplicates from the table. The duplicate is a typo in the first and last name.

Here is the sample script

CREATE TABLE test_dup(
f_name nvarchar(20)
,l_name nvarchar(20)
,state nvarchar(20)
)

INSERT INTO dbo.test_dup
(
f_name,
l_name,
state
)
SELECT 'James','Smith','CA'
UNION all
select 'Smith','James','CA'
UNION all
select 'a','m','CA'
UNION all
select 'j','k','NY'

select * from dbo.test_dup

Here is the sample data
fname,lname,state
James,Smith,CA
Smith,James,CA
a,m,CA
j,k,NY

Here is what I want. (Only three unique records)

f_name,l_name,state

James,Smith,CA

a,m,CA

j,k,NY

Please let me know how can I accomplish this.

Thanks.

for the duplicates what makes the one in your desired list the one that gets picked? Maybe there is a created date or some other unique missing field?

with src
as
(
SELECT 'James' as fn,'Smith' as ln,'CA' as c
UNION all
select 'Smith','James','CA'
UNION all
select 'a','m','CA'
UNION all
select 'j','k','NY'
)
select top 1 t.*
 From src src
 join src t on src.fn = t.ln
   and src.ln = t.fn
and src.c = t.c
union
select src.*
 From src src
 left join src t on src.fn = t.ln
   and src.ln = t.fn
   and src.c = t.c
   where t.fn is null 
        and t.ln is null   and t.c is null

Like was said over at SQLServerCentral.com, this is a really bad idea. As you said, the data is wrong. You need to convince management that the data needs to be fixed.

1 Like

Thank you for your help!

WARNING! BEFORE YOU USE THE CODE LISTED ABOVE, PLEASE READ THE FOLLOWING!

If someone is looking for a code solution to ignore duplicates then, sorry, the code above shouldn't be it. You will end up leaving out people that have legitimate names.

For example, I have a very good friend on a different site by the name of Michael John. I also have a work acquaintance whose name is John Michael.

If you have known "typo's", then fix the data instead of making this terrible mistake.

To check for possible such typo's as candidate rows that need manual verification, then do something like the following (I played this against the AdventureWorks database) and then manually verify whether or not the entries are correct or not You might even want to add a HasBeenVerified column to your table so they don't show up in such validation queries in the future.

Here's the code...

 SELECT  lftBusinessEntityID = lft.BusinessEntityID
        ,rgtBusinessEntityID = rgt.BusinessEntityID
        ,lftFirstName        = lft.FirstName
        ,lftLastName         = lft.LastName
        ,rgtFirstName        = rgt.FirstName
        ,rgtLastName         = rgt.LastName
   FROM person.Person lft
   JOIN person.Person rgt
     ON lft.FirstName        = rgt.LastName
    AND lft.LastName         = rgt.FirstName
    AND lft.BusinessEntityID < rgt.BusinessEntityID
  ORDER BY  lftFirstName,rgtLastName
           ,lftLastName ,rgtFirstName
;

... and here's the output from that code...

ALL of those First/Last name combinations are valid and common name combinations. Ignoring any of them with your code can be a serious mistake on your part. You MUST manually verify which ones are valid and FIX the ones than aren't.

2 Likes

A bit over dramatic but agree with you @JeffModen 100%

Maybe OP can use the query to find these duplicates instead of using it to remove duplicates

1 Like

If you've been through what I've been through with this type of mistake by others in companies that I've worked for, you wouldn't think it was over dramatic. In fact, you'd be congratulating me on my restraint.

1 Like