SQLTeam.com | Weblogs | Forums

Join logic problem for multiple relationships in one table

sql2012

#1

I have a dataset of customers and relations to other customers in the same table. I'm looking to create a join_id that links all related customers together.

High level scenario:
customer A is linked to customer B and B is linked to C (there is no link between A & C). I'm looking for A, B & C to be linked by a common join_id as they share a direct or indirect relationship.

Each customer has 6 attributes which can linked to 1 or many others. My table contains 2 million customer records.

Attached is a screenshot of my dataset. The desired result is that all customers in the sample data will be linked by 1 join_id (which I need to create via a query) as each customer is linked to another either directly or indirectly.

eg
rows 9, 10 & 11 are linked by id_name,
10 is linked to 3,4,5,8,10,12,16,17,18 by id_email,
18 is linked to 1,6,7,14,15,16 by id_address etc etc

My table also contains customers which will not be linked to other customers. In this scenario, no join_id is needed.

The created join_id should be an integer - an arbitrary value is fine.

I'm trying to avoid using a loop or cursor .. the output query needs to be performant.

This issue has me a little perplexed .. there must be a logical answer in here somewhere. I'm hoping someone out there can assist me. Thanks, in advance.


#2

What if customer A is related to 100 other customers


#3

How many different ways can a customer be linked? You listed 3, but I'm sure there is more. Also, please provide ddl and data. Screen shots are good for explaining, but I am not going to recreate your data for you


#4

Thanks for the reply.

What if customer A is related to 100 other customers

Given the data, this is very unlikely. That being said, the resulting query shouldn't be bounded so that it can cater for 1 or many link/relationships.

The below can be used to create a sample data set. Each customer has 6 attributes that could be used for linking to other customers. With this data set, the result should be 3 groups of customers (3 distinct join_id's)

Thanks, in advance.


USE [test]
GO

-- DROP TABLE dbo.[cust_rel_sample]
CREATE TABLE dbo.[cust_rel_sample]
(
[customer_id] [int] NULL,
[cnt_name] [int] NULL,
[id_name] [int] NULL,
[cnt_bank] [int] NULL,
[id_bank] [bigint] NULL,
[cnt_mobile] [int] NULL,
[id_mobile] [int] NULL,
[cnt_home] [int] NULL,
[id_home] [int] NULL,
[cnt_email] [int] NULL,
[id_email] [int] NULL,
[cnt_address] [int] NULL,
[id_address] [int] NULL
)
GO
;

INSERT INTO dbo.[cust_rel_sample]
VALUES
(1035608,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1285215,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1618173,2,1618173,4,1618173,1,NULL,4,1618173,1,NULL,1,NULL),
(2091361,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(2543284,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000875447,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000941758,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000941737,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000962860,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,1,NULL),
(1000242402,2,1707868,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1006384494,4,2543284,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(2543285,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000852813,1,NULL,1,NULL,1,NULL,1,NULL,2,1000852813,1,NULL),
(1000242383,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1000847636,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000932977,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000978168,1,NULL,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000307069,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1217037,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1313183,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1005964327,4,2543284,4,1618173,1,NULL,4,1618173,4,1005964327,1,NULL),
(1005977057,1,NULL,1,NULL,1,NULL,1,NULL,4,1005964327,1,NULL),
(1707868,2,1707868,1,NULL,1,NULL,1,NULL,4,1707868,1,NULL),
(1006327204,2,1618173,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(1000863810,3,1000863810,2,1000863810,1,NULL,3,1000863810,1,NULL,1,NULL),
(1000929465,3,1000863810,1,NULL,1,NULL,3,1000863810,10,1000847636,1,NULL),
(1000933009,3,1000863810,2,1000863810,1,NULL,3,1000863810,2,1000852813,1,NULL),
(1000941560,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000977540,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL)


#5

did you inherit this table design or you came up with it? why would 1000863810 be related to itself by id_name?


#6

You haven't described what any of the fields mean, but a cursory look seems to indicate that the columns other than customer_id exist in pairs with names like cnt_foo and id_foo, where cnt_foo gives a count of the number of times the following value of id_foo appears in that column in all rows. Some of the id_... values appear in many columns. Are these fields foreign keys to other tables? What exactly are you trying to accomplish?


#7

Each record in this table represents a customer account. These are sourced from different systems.
The purpose of this processing task is to link customer account records that are 1) the same customer 2) likely to be the same customer as they share a common attribute eg same name / bank details etc.

A processing step prior to this looks for common attributes and applies a unique id to the group eg customer_id's
1000863810 & 1000929465 & 1000933009 all share the same name and therefore share the same id_name value. The value
used to indicate these values are the same happens to be the minimum customer_id in the group (but this in fact could
be any grouping id value .. like with id_address .. in this case the value 635 (which is an internal address id) has
been used to group customers with the same address).

So, to follow this example through (from the screen-shot):

group 1: rows 9, 10 & 11 share the same name.
group 2: row 10 shares the same email address as 3,4,5,8,12,13,16,17 & 18
.. as row 10 is common in the two groups, groups 1 & 2 should now be linked together.
group 3: rows 16 & 18 share the same address as rows 1, 6, 7, 14, 15
.. as row 16 & 18 is common between group 2 & group 3, groups 1, 2 & 3 should now be linked
group 4: row 11 shares the same email address as row 2
.. so now group 4 is linked to group 1 and thus all groups 1, 2, 3 & 4 (all 18 customers) are linked directly and/or
indirectly

I know it's not the most logical dataset .. but I'm hoping I can work with what i've got (opposed to re-working
the prior processing steps).

Thanks


#8

Could you please let me know the desired output that you want from above sample data


#9

Thanks for the reply .. a table containing customer_id & join_id is all that is required for an output.
join_id being the created grouping identifier once a group of customers has been identified.

Below is a screenshot of the anticipated result from this sample data-set - 3 separate groups of customers.


#10

hi

i tried to understand this ...

This is what I have ... if this is correct then we can proceed further
i mean linking .. part

Please let me know
:slight_smile:
:slight_smile:

drop create sample data ...
use tempdb 
go 




DROP TABLE dbo.[cust_rel_sample]

CREATE TABLE dbo.[cust_rel_sample] 
             ( 
                          [customer_id] [INT] NULL, 
                          [cnt_name] [INT] NULL, 
                          [id_name] [INT] NULL, 
                          [cnt_bank] [INT] NULL, 
                          [id_bank] [BIGINT] NULL, 
                          [cnt_mobile] [INT] NULL, 
                          [id_mobile] [INT] NULL, 
                          [cnt_home] [INT] NULL, 
                          [id_home] [INT] NULL, 
                          [cnt_email] [INT] NULL, 
                          [id_email] [INT] NULL, 
                          [cnt_address] [INT] NULL, 
                          [id_address] [INT] NULL 
             ) 
 go

INSERT INTO dbo.[cust_rel_sample] VALUES 
            ( 
                        1035608,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1285215,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1618173,2,1618173,4,1618173,1, 
                        NULL, 
                        4,1618173,1, 
                        NULL, 
                        1, 
                        NULL 
            ) 
            , 
            ( 
                        2091361,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        2543284,4,2543284,1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1618173,2,2543284,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000875447,2,1000847636,1, 
                        NULL, 
                        1, 
                        NULL, 
                        2,1000847636,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000941758,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000941737,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000962860,2,1000932977,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000932977,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000242402,2,1707868,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000242383,4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1006384494,4,2543284,4,1618173,1, 
                        NULL, 
                        2,1006327204,4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        2543285,4,2543284,1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1618173,2,2543284,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000852813,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        2,1000852813,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000242383,2,1000242383,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000242383,4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000847636,2,1000847636,1, 
                        NULL, 
                        1, 
                        NULL, 
                        2,1000847636,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000932977,2,1000932977,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000932977,10,1000847636,7,635 
            ) 
            , 
            ( 
                        1000978168,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000932977,10,1000847636,7,635 
            ) 
            , 
            ( 
                        1000307069,2,1000242383,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000242383,4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1217037,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1313183,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1005964327,4,2543284,4,1618173,1, 
                        NULL, 
                        4,1618173,4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        1005977057,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        1707868,2,1707868,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1006327204,2,1618173,4,1618173,1, 
                        NULL, 
                        2,1006327204,4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000863810,3,1000863810,2,1000863810,1, 
                        NULL, 
                        3,1000863810,1, 
                        NULL, 
                        1, 
                        NULL 
            ) 
            , 
            ( 
                        1000929465,3,1000863810,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000863810,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000933009,3,1000863810,2,1000863810,1, 
                        NULL, 
                        3,1000863810,2,1000852813,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000941560,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000977540,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            )
go

share the same name .. grouping

sql same name share grouping
SELECT customer_id, 
       id_name, 
       Dense_rank() 
         OVER( 
           ORDER BY id_name) 
FROM   cust_rel_sample 

go

share the same email address .. grouping

sql same email address share grouping
SELECT customer_id, 
       id_email, 
       Dense_rank() 
         OVER( 
           ORDER BY id_email) 
FROM   cust_rel_sample

share the same .. address .. grouping

sql same address share grouping
SELECT customer_id, 
       id_address, 
       Dense_rank() 
         OVER( 
           ORDER BY id_address) 
FROM   cust_rel_sample

#11

@harishgg1 yes, this will get you the distinct customers that share the same attributes .. i.e. the lowest level group.

Next stage would be to create higher level groups where customers exist in multiple groups, and link these together ..


#14

hi

i took a look at the screenshot you provided
for desired result

I came up with the SQL
Please check and let me know
Thanks
:slight_smile:
:slight_smile:

SQL ...
;WITH cte 
     AS (SELECT a.customer_id, 
                2 AS grp 
         FROM   cust_rel_sample a 
                JOIN cust_rel_sample b 
                  ON a.id_name = b.id_name 
         UNION 
         SELECT a.customer_id, 
                3 AS grp 
         FROM   cust_rel_sample a 
                JOIN cust_rel_sample b 
                  ON a.id_email = b.id_email 
         UNION 
         SELECT a.customer_id, 
                1 AS grp 
         FROM   cust_rel_sample a 
                JOIN cust_rel_sample b 
                  ON a.id_address = b.id_address) 
SELECT * 
FROM   cte 

go
Image Results ..


#16

DROP TABLE dbo.[cust_rel_sample]
CREATE TABLE dbo.[cust_rel_sample]
(
[customer_id] [int] NULL,
[cnt_name] [int] NULL,
[id_name] [int] NULL,
[cnt_bank] [int] NULL,
[id_bank] [bigint] NULL,
[cnt_mobile] [int] NULL,
[id_mobile] [int] NULL,
[cnt_home] [int] NULL,
[id_home] [int] NULL,
[cnt_email] [int] NULL,
[id_email] [int] NULL,
[cnt_address] [int] NULL,
[id_address] [int] NULL
)
GO
;

INSERT INTO dbo.[cust_rel_sample]
VALUES
(1035608,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1285215,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1618173,2,1618173,4,1618173,1,NULL,4,1618173,1,NULL,1,NULL),
(2091361,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(2543284,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000875447,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000941758,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000941737,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000962860,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,1,NULL),
(1000242402,2,1707868,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1006384494,4,2543284,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(2543285,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000852813,1,NULL,1,NULL,1,NULL,1,NULL,2,1000852813,1,NULL),
(1000242383,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1000847636,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000932977,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000978168,1,NULL,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000307069,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1217037,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1313183,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1005964327,4,2543284,4,1618173,1,NULL,4,1618173,4,1005964327,1,NULL),
(1005977057,1,NULL,1,NULL,1,NULL,1,NULL,4,1005964327,1,NULL),
(1707868,2,1707868,1,NULL,1,NULL,1,NULL,4,1707868,1,NULL),
(1006327204,2,1618173,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(1000863810,3,1000863810,2,1000863810,1,NULL,3,1000863810,1,NULL,1,NULL),
(1000929465,3,1000863810,1,NULL,1,NULL,3,1000863810,10,1000847636,1,NULL),
(1000933009,3,1000863810,2,1000863810,1,NULL,3,1000863810,2,1000852813,1,NULL),
(1000941560,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000977540,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL)

;with cte as(
select customer_id,id_name as id,DENSE_RANK() over(order by id_name) as cnt from dbo.[cust_rel_sample]
where id_name is not null
union all
select customer_id,id_email as id,DENSE_RANK() over(order by id_email) as cnt from dbo.[cust_rel_sample]
where id_email is not null
union all
select
customer_id,id_address as id,DENSE_RANK() over(order by id_address) as cnt from dbo.[cust_rel_sample]
where id_address is not null
union all
select customer_id,id_bank as id,DENSE_RANK() over(order by id_bank) as cnt from dbo.[cust_rel_sample]
where id_bank is not null
union all
select customer_id,id_mobile as id,DENSE_RANK() over(order by id_mobile) as cnt from dbo.[cust_rel_sample]
where id_mobile is not null
union all
select customer_id,id_home as id,DENSE_RANK() over(order by id_home) as cnt from dbo.[cust_rel_sample]
where id_home is not null)

select customer_id,count(customer_id)
from cte
group by customer_id


#17

Thanks @mannesravya & @harishgg1 - both suggestions make a start. We can now identify groups of customers with similar attributes.

If we go back to the original problem statement

customer A is linked to customer B and B is linked to C (there is no link between A and C).
I'm looking for A, B and C to be linked by a common join_id

Your queries allow me to create a group for A and B and another group for B and C.

Next step is to link the first group with the second group to create an overall group containing A, B and C.


#18

I am little bit confused. Could you please explain that with real columns in your table.


#19

hi jasebase

could you please give some sort of illustriation
so that we can
understand

example like below

image