SQLTeam.com | Weblogs | Forums

Finding Duplicate Records


#1

I have a records with the fields Firstname, Lastname, RefNum
A lot of the records are duplicate based on Firstname and Lastname but not all records have a value in RefNum.
I want to find any records that contain a value in RefNum which are a duplicate. But the duplicate detection of these records is based on Firstname and Lastname, RefNum does not come into the duplicate detection.
Anyone any ideas?


#2

Not quite clear to me what your requirements are; perhaps one of these?

-- any duplicates on firstname and lastname
SELECT firstname, lastname
FROM Tbl
GROUP BY firstname, lastname
HAVING COUNT(*) > 1

-- duplicates on firstname and lastname with non-null refnum
SELECT firstname, lastname
FROM Tbl
GROUP BY firstname, lastname
HAVING COUNT(refnum) > 1

-- duplicates on firstname and lastname with more than one
-- distinct refnum
SELECT firstname, lastname
FROM Tbl
GROUP BY firstname, lastname
HAVING COUNT(DISTINCT refnum) > 1

#3

Thanks for the reply James.
What I am looking for is Duplicates on firstname and lastname for any records containing refnum, but the record that it is a duplicate of does not necessarily have to have a refnum value.

Hope this clearer.


#4

Basically any records containing refnum I want to find there duplicates based on firstname and lastname, but the duplicate record does not have to have a refnum value.


#5

It is always easier to understand if you post sample data, for example like shown below - which someone can copy and paste to an SSMS window and run. If you copy, paste and run the code below, you will see that there are four pairs of duplicate first names and last names. Which of these do you want to get in the output of your query?

CREATE TABLE #tmp
(firstname VARCHAR(32), lastname VARCHAR(32), refnum INT);

INSERT INTO #tmp VALUES
('a','b',1),
('a','b',1),
('c','d',1),
('c','d',NULL),
('e','f',1),
('e','f',2),
('g','h',NULL),
('g','h',NULL);

SELECT * FROM #tmp;

#6

James If 'a' is Firstname and 'b' is lastname and 1 means refnum contains data then I would like to get the records of the type:

('a','b',1),
('a','b',1),

and
('c','d',1),
('c','d',NULL),

and
('e','f',1),
('e','f',2),

output as my results as the Firstname and Lastnames are the same and at least one record contains a value for refnum.


#7
SELECT firstname, lastname
FROM #tmp
GROUP BY firstname, lastname
HAVING 
	COUNT (*) > 1
	AND SUM( CASE WHEN refnum IS NULL THEN 0 ELSE 1 END) >= 1

#8

Thanks for the reply James.
So this query will return all the Firstnames and Lastnames for the required records.
Would it be possible that the Refnum is returned as part of the results also?


#9
SELECT firstname, lastname, refnum
FROM #tmp t1
where exists
(
	select * from #tmp t2
	where t1.firstname = t2.firstname
		and t1.lastname = t2.lastname
	GROUP BY t2.firstname, t2.lastname
	HAVING 
		COUNT (*) > 1
		AND SUM( CASE WHEN t2.refnum IS NULL THEN 0 ELSE 1 END) >= 1
)

#10

Hi James thanks for the reply.
This seems to be what I need but I know need to get the duplicates based PersonalNo also.
How could add PersonalNo into the above?


#11
SELECT firstname, lastname, PERSONALNO, refnum
FROM #tmp t1
where exists
(
	select * from #tmp t2
	where t1.firstname = t2.firstname
		and t1.lastname = t2.lastname
		AND t1.PERSONALNO = t2.PERSONALNO
	GROUP BY t2.firstname, t2.lastname, t2.PERSONALNO
	HAVING 
		COUNT (*) > 1
		AND SUM( CASE WHEN t2.refnum IS NULL THEN 0 ELSE 1 END) >= 1
)

#12

James I am using your code but it is returning some records that are not duplicates.
I am using only one table called CitizenContact. Am I using this correctly?

SELECT FirstName, LastName, PN, refnum
FROM CitizenContact AS t1
WHERE EXISTS
(SELECT FirstName, LastName, PN
FROM CitizenContact AS t2
WHERE (t1.FirstName = FirstName) AND (t1.LastName = LastName) AND (t1.PN = PN)
GROUP BY FirstName, LastName, PN
HAVING (COUNT(*) > 1) AND (SUM(CASE WHEN t2.refnum IS NULL THEN 0 ELSE 1 END) >= 1))


#13

Query seems fine to me. Pick one example where you are getting incorrect results and insert those rows into the #tmp table example that I had posted earlier and see what you get.


#14

I dont understand what you mean James?


#15

What I expect to return is duplicate records where all four fields Firstname, Lastname, PN, Refnum all match in each duplicate that they return.


#16

Hi @macca Try this:-

;with cte as (
select
ROW_NUMBER() over (partition by
FirstName,
LastName,
PN,
refnum
order by (select 0))rn,
FirstName,
LastName,
PN,
refnum
FROM CitizenContact
)
select
*
from cte
where rn > 1


#17

This is different from what you said earlier. In an earlier post, based on the example I had posted, you said you wanted to get these:

James If 'a' is Firstname and 'b' is lastname and 1 means refnum contains data then I would like to get the records of the type:

('a','b',1),
('a','b',1),

and
('c','d',1),
('c','d',NULL),

and
('e','f',1),
('e','f',2),

If you are only looking for duplicates based on refnum as well, the third pair wouldn't be a duplicate, would it?


#18

i should have explained James that what you gave me worked fine but now I require duplicates based on four fields rather than two.


#19

Hi muj9 thanks for the reply but could you explain your code a bit as I don't follow what is happening.


#20

If you are just looking for dups on all four columns (regardless of whether refnum is null or not) then

SELECT
	FirstName ,
    LastName ,
    PN ,
    refnum
FROM
(                            
	SELECT
		*,
		COUNT (*)OVER 
			(
				PARTITION BY  FirstName, LastName, PN, RefNum
			) AS N
	FROM
		CitizenContact
) s
WHERE N > 1;

You can run the inner SELECT statement by itself to see what it is doing. If you want to omit rows where refnum is null, then instead of count(*) use count(refnum).