List duplicates one below another based on 2 different fields

sinakv · August 31, 2015, 5:39am

Hi All

I have been trying to get the duplicates listed out one below another.

Eg. I am trying to list the records for which the first name in one record matches with the last name in another record. Both these records should be listed one after another. Consider the following table:
Id First_Name Last_Name
1 ABC XYZ
2 DDD EEE
3 CDE DEF
4 MOP LOP
5 JKL ABC
6 MNO CDE

Output should be as follows:
Id First_Name Last_Name
1 ABC XYZ
5 JKL ABC
3 CDE DEF
6 MNO CDE

Following is the query I have attempted so far.

select a.Id, a.First_Name, a.Last_Name from Table a, Table b
where a.Id <> b.Id and and ((a.First_Name = b.Last_Name and a.First_Name <> '' and b.Last_Name <> '')
or (a.Last_Name = b.First_Name and a.Last_Name <> '' and b.First_Name <> '' ))
order by a.First_Name,b.Last_Name

This is the order I get:
Id First_Name Last_Name
1 ABC XYZ
3 CDE DEF
5 JKL ABC
6 MNO CDE

The records are retrieved correctly, but the ordering is incorrect. The output I need is listing each duplicated record one below another. Can somebody please guide me to get this output ordering correctly?

Thanks and Regards
Sina K V

khtan · August 31, 2015, 6:00am

; with data as
(
    select id = 1, First_Name = 'ABC', Last_Name = 'XYZ'    union all
    select id = 2, First_Name = 'DDD', Last_Name = 'EEE'    union all
    select id = 3, First_Name = 'CDE', Last_Name = 'DEF'    union all
    select id = 4, First_Name = 'MOP', Last_Name = 'LOP'    union all
    select id = 5, First_Name = 'JKL', Last_Name = 'ABC'    union all
    select id = 6, First_Name = 'MNO', Last_Name = 'CDE'
)
select    d1.id, d1.First_Name, d1.Last_Name
from    data d1
    inner join data d2    on    d1.First_Name    = d2.Last_Name
                or    d1.Last_Name    = d2.First_Name
order by case when d1.id < d2.id then d1.id else d2.id end, d1.id, d2.id

id          First_Name Last_Name 
----------- ---------- --------- 
1           ABC        XYZ
5           JKL        ABC
3           CDE        DEF
6           MNO        CDE

sinakv · August 31, 2015, 6:59am

Hi khtan

Thank you so much for your quick response. I got the ordering almost correct using the case by as suggested by you. But since the ordering is done based on the Id, the output is not always accurate. For example, below is a part of the actual output I got with my data using the case by:

Id First_Name Last_Name
0329 SAR KRI
7541 KRI IYE
0431 JUB SHE
7858 SHE SHE
0527 BIL SHE
7858 SHE SHE
0877 SHA SAY
0898 SAY ALI
0884 ABI SHE
7858 SHE SHE
0908 JAY GAU
1794 GAU CHA

As you can see here, SHE record is present in the table multiple times. Ideally it should be all grouped together as one set, but since ordering is done based on Id, they are displayed based on the order of the Id. What I am trying to achieve here is to group together the records (duplicates) based on matching of first name and last name irrespective of what Id it has. Can you please suggest a way to get this done?

Thanks and regards
Sina K V

khtan · August 31, 2015, 7:50am

; with cte as
(
    select  key_name = First_Name, d.*
    from    data d
    union
    select  key_name = Last_Name, d.*
    from    data d
),
cte2 as
(
    select    *, grp_no = dense_rank() over (order by key_name), cnt = count(*) over (partition by key_name)
    from    cte
)
select    key_name, id, First_Name, Last_Name
from    cte2
where    cnt    > 1
order by grp_no, id

sinakv · August 31, 2015, 11:29am

Hi khtan

Thank you so much for the solution. It worked perfectly. Following is my final query:

select key_name, Id, First_Name, Last_Name
from (
select , grp_no = dense_rank() over (order by key_name), cnt = count() over (partition by key_name)
from (
select key_name = a.First_name, a.*
from Table a, Table b
where a.Id <> b.Id
and (a.First_Name = b.Last_Name and a.First_Name <> '' and b.Last_Name <> '')
union
select key_name = a.Last_Name, a.*
from Table a, Table b
where a.Id <> b.Id
and (a.Last_Name = b.First_Name and a.Last_Name <> '' and b.First_Name <> '' )
) as tmp
) as tmp1
where cnt > 1
order by grp_no, id

Thanks and Regards
Sina K V