Find Duplicates based on 1 column and return correlated values in separated columns

itsetsos · October 8, 2019, 1:03pm

Good evening,

I'm looking for something rather easy for you who know a lot about SQL, but I am really confused.

I want to find out which of our customers have the same phone (I find this easy), but I also want to bring in more information for those who have the same phone number.

Example:
FirstName LastName Age Telephone
Eva Mendes 45 +302101234567
Ryan Gosling 38 +302101234567

So, in this case, 2 customers have the same phone number. So what I want is a query to search for same phones and if find (like example), then return 1 row like this:

1_FirstName 1_LastName 1_Age SAME_Telephone 2_FirstName 2_LastName 2_Age
Eva Mendes 45 +302101234567 Ryan Gosling 38

I have tried with left join options, with min and max options and various groups but it didn't work properly in all cases.

I also searched it a lot on google but while I found too many duplicate scripts, I didn't find anything that fit my need.

Ι would appreciate it if you spent your time helping me

Thanks a lot,
Iraklis

harishgg1 · October 8, 2019, 1:54pm

hi

i tried to do this
hope this helps
-- this does not account for ALL scenarios

please click the arrrow to the left for "drop create data" script

drop table #data 
go 

create table #data 
(
FirstName  varchar(100), 
LastName   varchar(100), 
Age                 int, 
Telephone  varchar(100)
)
go 

insert into #data select 'Eva' ,'Mendes' , 45 , '+302101234567'
insert into #data select 'Ryan','Gosling', 38 , '+302101234567'
insert into #data select 'John','Lock'   , 45 , '+302101234566'
insert into #data select 'Jay' ,'Smith'  , 38 , '+302101232233'
go

please click the arrow to the left for "SQL" script

; WITH cte 
     AS (SELECT telephone, 
                Count(telephone) AS cnt_tel 
         FROM   #data 
         GROUP  BY telephone), 
     cte_gt_1 
     AS (SELECT * 
         FROM   cte 
         WHERE  cnt_tel > 1), 
     cte_fin 
     AS (SELECT Row_number() 
                  OVER( 
                    ORDER BY firstname DESC ) AS rn, 
                a.* 
         FROM   #data a 
                JOIN cte_gt_1 b 
                  ON a.telephone = b.telephone) 
SELECT * 
FROM   cte_fin a 
       JOIN cte_fin b 
         ON a.rn = b.rn + 1 

go

yosiasz · October 8, 2019, 2:59pm

the issue is you want Eva Mendes to be first but based on what? There is no other criteria that she should be first? If there a created_date column to differentiate which one should be selected? or any other column?

create table #always_provide_sampledata(FirstName varchar(50), 
LastName varchar(50), Age int, Telephone varchar(50))

insert into #always_provide_sampledata
select 'Eva', 'Mendes', 45, +302101234567 union
select 'Ryan', 'Gosling', 38, +302101234567 union
select 'Den', 'Zel', 58, +302101234567 union
select 'Mariah', 'Carey', 58, +302101234568


;with boomshaka
as
(
select * ,
   ROW_NUMBER() OVER (
      PARTITION BY Telephone
      ORDER BY Telephone
   ) row_num
  From #always_provide_sampledata
)
select * 
  From boomshaka
  where row_num = 1

drop table #always_provide_sampledata

ScottPletcher · October 8, 2019, 3:02pm

SELECT
    MAX(CASE WHEN row_num = 1 THEN FirstName END) AS [1_FirstName],
    MAX(CASE WHEN row_num = 1 THEN LastName END) AS [1_LastName],
    MAX(CASE WHEN row_num = 1 THEN Age END) AS [1_Age],
    Telephone,
    MAX(CASE WHEN row_num = 2 THEN FirstName END) AS [2_FirstName],
    MAX(CASE WHEN row_num = 2 THEN LastName END) AS [2_LastName],
    MAX(CASE WHEN row_num = 2 THEN Age END) AS [2_Age]
FROM (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY Telephone ORDER BY LastName, FirstName) AS row_num
    FROM #data
) AS derived
WHERE row_num IN (1, 2)
GROUP BY Telephone
HAVING MAX(CASE WHEN row_num = 2 THEN FirstName END) IS NOT NULL
ORDER BY Telephone

itsetsos · October 8, 2019, 6:14pm

Hi Harishgg1,
Thanks a lot for your time!!

I used your code but while in the original query you sent me everything was fine ( as I want ) when I adjusted it to my base I saw that the results were not returning properly.

After searching, I noticed that adding more customers to your query also creates a problem.

unfortunately, I forgot to mention that there is 1 unique id (csacode) in my db for each client I have on my database, on which I want to separate clients (firstname, lastname sometimes not unique)

If you use extra lines for customer,you can see that something is going wrong:

Now in my example Eva Mendes and Ryan Gosling is one "pair" of same phone num and the other is Brad Pitt and Penelope Cruz but none of the two working..

insert into #data select 'Eva' ,'Mendes' , 45 , '+302101234567'
insert into #data select 'Ryan','Gosling', 38 , '+302101234567'
insert into #data select 'John','Lock' , 45 , '+302101234566'
insert into #data select 'Jay' ,'Smith' , 38 , '+302101232233'
insert into #data select 'George' ,'Clooney' , 61 , '+302101232233'
insert into #data select 'Brad' ,'Pitt' , 88 , '+302103360800'
insert into #data select 'Tom' ,'Cruise' , 60 , '+302101232233'
insert into #data select 'Penelope' ,'Cruz' , 53 , '+302103360800'

Thank you anyway for your time and the effort you put into helping me!!!

itsetsos · October 8, 2019, 6:49pm

Hi yosiasz,

Sorry to bother you but I'm a beginner so ... you understand !!!

There is no need to be someone 1st or 2nd. What I need is for every 1 phone number that belongs to 2 customers, the query return who these 2 customers are with their name, age, and unique code.

unfortunately, I forgot to mention the most important, that there is 1 unique id (csacode) in my db for each client I have on my database, on which I want to separate clients (firstname, lastname sometimes not unique)
If you use these sample data: ...
insert into #data select 'Eva' ,'Mendes' , 45 , '+302101234567',1025
insert into #data select 'Ryan','Gosling', 38 , '+302101234567',19832
insert into #data select 'John','Lock' , 45 , '+302101234566',12199
insert into #data select 'Jay' ,'Smith' , 38 , '+302101232233',55432
insert into #data select 'George' ,'Clooney' , 61 , '+302101232233',313
insert into #data select 'Brad' ,'Pitt' , 88 , '+302103360800',68900
insert into #data select 'Tom' ,'Cruise' , 60 , '+302101232233',31232
insert into #data select 'Penelope' ,'Cruz' , 53 , '+302103360800',45155
|Csacode |FirstName |LastName |Age |Telephone | Csacode |FirstName|LastName|Age|Telephone|

The results I want are these:

|1025 |Eva |Mendes |45 |+302101234567| 19832 |Ryan |Gosling |38 |+302101234567|
|68900 |Brad |Pitt |88 |+302103360800| 45155 |Penelope |Cruz |53 |+302101234567|

Thanks again for your time !!

yosiasz · October 8, 2019, 7:01pm

Ok missed that whole part, sorry. I think @ScottPletcher answered it for you?

itsetsos · October 11, 2019, 3:24pm

Thanks a lot, ScottPletcher for your solution! It works 100% for my DB !!!!

By the way, I have a question for your code. I can't understand the concept of this method using case in having.
if you can tell me a few words about "logic" behind this line of code, Ι would appreciate it.
Thanks a lot again!!!!

ScottPletcher · October 11, 2019, 6:13pm

You're welcome.

Ok, let's walk thru how the query generates results with two rows of sample data:
'Ryan', 'Gosling', 38, +302101234567, 1 (row_num)
'Eva', 'Mendes', 45, +302101234567, 2

SQL reads the first row, Ryan. Since the row_num is 1, the first MAX condition is true, and 1F is set to 'Ryan'. Likewise, 1L is set to 'Gosling' and 1A is set to 38.
2F, 2L and 2A are set to NULL since the row_num was 1 (SQL defaults to ELSE NULL if you don't specify ELSE).

SQL then reads the second row, 'Eva'. Since the row_num is 2, the value of 1F for the current row would be NULL. However, since we specified MAX value, and 'Ryan' is already in that column, the value doesn't change, since NULL is not > Ryan'. The same is true for all the 1* columns: they will retain the value they had from the previous row.
The 2F, 2L and 2A columns will get assigned values of course, since the row_num is 2. Thus, after processing both rows, 'Ryan'* is in the 1* columns and 'Eva'* is in 2* columns, just like we want.