Calculating Duplicates On a field that is Sometimes NULL

macca · November 13, 2015, 2:02pm

I am using the following query to get duplicates in a table:

SELECT FirstName, LastName, Num, COUNT() AS NumberDuplicates
FROM Table1
GROUP BY FirstName, LastName, Num
HAVING (COUNT() > 1)
ORDER BY NumberDuplicates DESC

I want to add to this query where it will only return duplicates where a field calle RefNum contains data.
Anyone know how I could achieve this?

khtan · November 13, 2015, 2:08pm

SELECT     FirstName, LastName, Num, COUNT(*) AS NumberDuplicates
FROM       Table1
WHERE      RefNum  IS NOT NULL
GROUP BY   FirstName, LastName, Num
HAVING     (COUNT(*) > 1)
ORDER BY   NumberDuplicates DESC

macca · November 13, 2015, 2:11pm

Thanks for the reply khtan
But I do not want only records where RefNum IS NOT NULL but where refnum Is Null and where it Is Not Null

khtan · November 13, 2015, 2:13pm

I don't quite get that statement. can you elaborate ?

macca · November 13, 2015, 2:14pm

When I try to add RefNum as a column to be returned like this and dont have it in the Group By I get an error message saying it RefNum must be contained in the Group By

khtan · November 13, 2015, 2:16pm

whatever columns in the SELECT that is not in aggregate must appear i the GROUP BY

SELECT FirstName, LastName, Num, COUNT(*) AS NumberDuplicates
FROM Table1
GROUP BY FirstName, LastName, Num
HAVING (COUNT(*) > 1)
ORDER BY NumberDuplicates DESC

macca · November 13, 2015, 2:18pm

This is my actual code:

I want to return the duplicate records that are duplicates with these records but those records may not have a value in HGRefNum

khtan · November 13, 2015, 2:22pm

i think it is probably easier to understand your requirement, if you can post some sample data and expected result

macca · November 13, 2015, 2:48pm

What I want is to return the duplicates in the table where one or other of the duplicate records contains a value in HGRefNum or not. Not all the duplicates will have a value in HGRefNum only one of them probably.

I just want to display the duplicate records and be able to see the HGRefNum field regardless of whether it holds data or not.

ScottPletcher · November 13, 2015, 6:49pm

SELECT FirstName, LastName, COUNT(*) AS NumberDuplicates
FROM Table1
GROUP BY FirstName, LastName
HAVING (COUNT(*) > 1) AND (MAX(RefNum) IS NOT NULL)
ORDER BY NumberDuplicates DESC

khtan · November 14, 2015, 12:16am

; WITH DUP AS
(
    SELECT   FirstName, LastName, COUNT(*) AS NumberDuplicates
    FROM     Table1
    GROUP BY FirstName, LastName
    HAVING   (COUNT(*) > 1)
)
SELECT  d.*, t.RefNum
FROM    DUP d
        INNER JOIN Table1 t    ON    d.FirstName    = t.FirstName
                               AND   d.LastName     = t.LastName
ORDER BY NumberDuplicates DESC

macca · November 16, 2015, 12:52pm

Hi KHtan,

I am trying to add the above to a Stored Procedure but keep getting error messages.
Do you know how I can add the above to a Stored Procedure?

djj55 · November 16, 2015, 12:56pm

What is the error?
This should go into a stored procedure without a problem. Can you post the code?

macca · November 16, 2015, 1:34pm

Here is the code:

INSERT INTO HDuplicates

; WITH DUP AS (SELECT FirstName, LastName, Num, COUNT() AS NumberDuplicates
FROM table1
WHERE Num IS NOT NULL
GROUP BY FirstName, LastName, Num
HAVING (COUNT() > 1))
SELECT d .*, t .Num
FROM DUP d INNER JOIN
table1t t ON d .FirstName = t .FirstName AND d .LastName = t .LastName
WHERE (Num LIKE N'%HO%')
ORDER BY NumberDuplicates DESC

END

khtan · November 16, 2015, 1:54pm

Should be

; WITH DUP
AS (
	SELECT FirstName,
		LastName,
		Num,
		COUNT(*) AS NumberDuplicates
	FROM table1
	WHERE Num IS NOT NULL
	GROUP BY FirstName,
		LastName,
		Num
	HAVING (COUNT(*) > 1)
	)
INSERT INTO HDuplicates ( <THE COLUMN NAME HERE> )
SELECT <THE COLUMN NAME HERE>
FROM DUP d
INNER JOIN table1t t
	ON d.FirstName = t.FirstName
		AND d.LastName = t.LastName
WHERE (Num LIKE N'%HO%')
ORDER BY NumberDuplicates DESC

macca · November 16, 2015, 2:24pm

I keep getting an error saying there is an error near ';'
Should we not be declaring DUP ?

djj55 · November 16, 2015, 2:49pm

You are using SQL Server which version? This is a CTE where DUP is "declared" by the "WITH"
The starting ; really should be on the statement before and there should be a ; after the DESC

macca · November 16, 2015, 2:50pm

I am using SQL Server 2008

djj55 · November 16, 2015, 2:59pm

Good so the CTE should work.
If you move the INSERT INTO as @khtan suggested do you still get the error?

macca · November 16, 2015, 7:33pm

This is now working khtan
If I wanted to get the duplicates based on three fields and pnum is the additional field do I just add it at the inner join as d.pnum = t.pnum ?