SQLTeam.com | Weblogs | Forums

Finding users with duplicate SSN - T-SQL help

tsql
sql2014

#1

Hello all,

I've been working on this problem for a long time, and I can't seem to get the query to return the correct results. Below is my problem:

I'm working on a system where users are able to create applications. Each year, they create a new application. I'm looking to find any different users that share the same social security number (SSN). My issue is: since there are multiple applications for each user, using COUNTS will return duplicate SSN even though it is for the same user. Here is an example of the data that I have: (note: none of these are true SSN, I'll use a 6 digit number for an example)

Table dbo.Application

ApplicationNumber     User               SSN
1000                  John               658-895
1001                  John               658-895
1002                  Kara               112-568
1003                  Sarah              598-789
1004                  Steve              658-895
1005                  Sarah              598-789

I've tried tons of different T-SQL, below is an example:

SELECT User,
    SSN
FROM dbo.Application
GROUP BY User, SSN
HAVING COUNT(SSN) > 1

This type of query will return something similar to:

User               SSN
John               658-895
John               658-895
Steve              658-895
Sarah              598-789
Sarah              598-789

Whereas all I want are situations where there is COUNT of more than 1 SSN and where the Users are different:

User               SSN
John               658-895
Steve              658-895

I would like all of the other records to be hidden, even if there are technically more than 1 (due to one user having 2+ applications which will have the same SSN).

This is a somewhat simplified version of my problem (more tables, slightly more complex logic), but if anyone can point me in the correct direction or give me example queries I would be greatly appreciative. Feel free to ask additional questions if you have them!

Thanks!

(using SQL 2014)


#2

Note, your output does not match you query. If you use user and ssn in the query (group by them) then the output would be as you indicated. If you need another column that is different.

SELECT User,
    SSN
FROM dbo.Application
GROUP BY User, SSN
HAVING COUNT(SSN) > 1;

#3

Subject was an error, thanks for pointing that out. I fixed the SELECT query to reflect the correct results.


#4

Whereas all I want are situations where there is COUNT of more than 1 SSN and where the Users are different:

SELECT SSN, MIN(User) AS Min_User, MAX(User) AS Max_User
FROM dbo.Application
GROUP BY SSN
HAVING COUNT(DISTINCT User) > 1

#5

Awesome, thanks for the idea @ScottPletcher. The only issue is that there could be more than 2 users sharing the same SSN, where this will only show the minimum UserId and maximum UserId. I would like to assume that there could be 5+ users with the same SSN.


#6

I understand, no problem:

SELECT a.*
FROM (
    SELECT SSN
    FROM dbo.Application
    GROUP BY SSN
    HAVING COUNT(DISTINCT User) > 1
) AS SSN_dup_user
INNER JOIN dbo.Application a ON a.SSN = SSN_dup_user.SSN
ORDER BY a.SSN, a.User

#7

using windows functions

with cte
as
(
select count( t.ssn) over(partition by t.ssn order by t.ssn) as ct, t.Name, t.ssn from #t as t
)
 select distinct t.Name, t.ssn from cte as t
  inner join cte as t1  on t.ct =t1.ct and t.Name <> t1.Name and t.SSN =t1.SSN

#8

Yet another database with SSNs in clear text. What the hell is wrong with you people? Would you want your PII and SSN in this database?