SQLTeam.com | Weblogs | Forums

Join logic problem for multiple relationships in one table

sql2012

#1

I have a dataset of customers and relations to other customers in the same table. I'm looking to create a join_id that links all related customers together.

High level scenario:
customer A is linked to customer B and B is linked to C (there is no link between A & C). I'm looking for A, B & C to be linked by a common join_id as they share a direct or indirect relationship.

Each customer has 6 attributes which can linked to 1 or many others. My table contains 2 million customer records.

Attached is a screenshot of my dataset. The desired result is that all customers in the sample data will be linked by 1 join_id (which I need to create via a query) as each customer is linked to another either directly or indirectly.

eg
rows 9, 10 & 11 are linked by id_name,
10 is linked to 3,4,5,8,10,12,16,17,18 by id_email,
18 is linked to 1,6,7,14,15,16 by id_address etc etc

My table also contains customers which will not be linked to other customers. In this scenario, no join_id is needed.

The created join_id should be an integer - an arbitrary value is fine.

I'm trying to avoid using a loop or cursor .. the output query needs to be performant.

This issue has me a little perplexed .. there must be a logical answer in here somewhere. I'm hoping someone out there can assist me. Thanks, in advance.


#2

What if customer A is related to 100 other customers


#3

How many different ways can a customer be linked? You listed 3, but I'm sure there is more. Also, please provide ddl and data. Screen shots are good for explaining, but I am not going to recreate your data for you


#4

Thanks for the reply.

What if customer A is related to 100 other customers

Given the data, this is very unlikely. That being said, the resulting query shouldn't be bounded so that it can cater for 1 or many link/relationships.

The below can be used to create a sample data set. Each customer has 6 attributes that could be used for linking to other customers. With this data set, the result should be 3 groups of customers (3 distinct join_id's)

Thanks, in advance.


USE [test]
GO

-- DROP TABLE dbo.[cust_rel_sample]
CREATE TABLE dbo.[cust_rel_sample]
(
[customer_id] [int] NULL,
[cnt_name] [int] NULL,
[id_name] [int] NULL,
[cnt_bank] [int] NULL,
[id_bank] [bigint] NULL,
[cnt_mobile] [int] NULL,
[id_mobile] [int] NULL,
[cnt_home] [int] NULL,
[id_home] [int] NULL,
[cnt_email] [int] NULL,
[id_email] [int] NULL,
[cnt_address] [int] NULL,
[id_address] [int] NULL
)
GO
;

INSERT INTO dbo.[cust_rel_sample]
VALUES
(1035608,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1285215,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1618173,2,1618173,4,1618173,1,NULL,4,1618173,1,NULL,1,NULL),
(2091361,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(2543284,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000875447,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000941758,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000941737,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000962860,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,1,NULL),
(1000242402,2,1707868,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1006384494,4,2543284,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(2543285,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000852813,1,NULL,1,NULL,1,NULL,1,NULL,2,1000852813,1,NULL),
(1000242383,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1000847636,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000932977,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000978168,1,NULL,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000307069,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1217037,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1313183,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1005964327,4,2543284,4,1618173,1,NULL,4,1618173,4,1005964327,1,NULL),
(1005977057,1,NULL,1,NULL,1,NULL,1,NULL,4,1005964327,1,NULL),
(1707868,2,1707868,1,NULL,1,NULL,1,NULL,4,1707868,1,NULL),
(1006327204,2,1618173,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(1000863810,3,1000863810,2,1000863810,1,NULL,3,1000863810,1,NULL,1,NULL),
(1000929465,3,1000863810,1,NULL,1,NULL,3,1000863810,10,1000847636,1,NULL),
(1000933009,3,1000863810,2,1000863810,1,NULL,3,1000863810,2,1000852813,1,NULL),
(1000941560,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000977540,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL)


#5

did you inherit this table design or you came up with it? why would 1000863810 be related to itself by id_name?


#6

You haven't described what any of the fields mean, but a cursory look seems to indicate that the columns other than customer_id exist in pairs with names like cnt_foo and id_foo, where cnt_foo gives a count of the number of times the following value of id_foo appears in that column in all rows. Some of the id_... values appear in many columns. Are these fields foreign keys to other tables? What exactly are you trying to accomplish?


#7

Each record in this table represents a customer account. These are sourced from different systems.
The purpose of this processing task is to link customer account records that are 1) the same customer 2) likely to be the same customer as they share a common attribute eg same name / bank details etc.

A processing step prior to this looks for common attributes and applies a unique id to the group eg customer_id's
1000863810 & 1000929465 & 1000933009 all share the same name and therefore share the same id_name value. The value
used to indicate these values are the same happens to be the minimum customer_id in the group (but this in fact could
be any grouping id value .. like with id_address .. in this case the value 635 (which is an internal address id) has
been used to group customers with the same address).

So, to follow this example through (from the screen-shot):

group 1: rows 9, 10 & 11 share the same name.
group 2: row 10 shares the same email address as 3,4,5,8,12,13,16,17 & 18
.. as row 10 is common in the two groups, groups 1 & 2 should now be linked together.
group 3: rows 16 & 18 share the same address as rows 1, 6, 7, 14, 15
.. as row 16 & 18 is common between group 2 & group 3, groups 1, 2 & 3 should now be linked
group 4: row 11 shares the same email address as row 2
.. so now group 4 is linked to group 1 and thus all groups 1, 2, 3 & 4 (all 18 customers) are linked directly and/or
indirectly

I know it's not the most logical dataset .. but I'm hoping I can work with what i've got (opposed to re-working
the prior processing steps).

Thanks


#8

Could you please let me know the desired output that you want from above sample data


#9

Thanks for the reply .. a table containing customer_id & join_id is all that is required for an output.
join_id being the created grouping identifier once a group of customers has been identified.

Below is a screenshot of the anticipated result from this sample data-set - 3 separate groups of customers.


#10

hi

i tried to understand this ...

This is what I have ... if this is correct then we can proceed further
i mean linking .. part

Please let me know
:slight_smile:
:slight_smile:

drop create sample data ...
use tempdb 
go 




DROP TABLE dbo.[cust_rel_sample]

CREATE TABLE dbo.[cust_rel_sample] 
             ( 
                          [customer_id] [INT] NULL, 
                          [cnt_name] [INT] NULL, 
                          [id_name] [INT] NULL, 
                          [cnt_bank] [INT] NULL, 
                          [id_bank] [BIGINT] NULL, 
                          [cnt_mobile] [INT] NULL, 
                          [id_mobile] [INT] NULL, 
                          [cnt_home] [INT] NULL, 
                          [id_home] [INT] NULL, 
                          [cnt_email] [INT] NULL, 
                          [id_email] [INT] NULL, 
                          [cnt_address] [INT] NULL, 
                          [id_address] [INT] NULL 
             ) 
 go

INSERT INTO dbo.[cust_rel_sample] VALUES 
            ( 
                        1035608,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1285215,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1618173,2,1618173,4,1618173,1, 
                        NULL, 
                        4,1618173,1, 
                        NULL, 
                        1, 
                        NULL 
            ) 
            , 
            ( 
                        2091361,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        2543284,4,2543284,1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1618173,2,2543284,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000875447,2,1000847636,1, 
                        NULL, 
                        1, 
                        NULL, 
                        2,1000847636,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000941758,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000941737,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000962860,2,1000932977,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000932977,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000242402,2,1707868,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000242383,4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1006384494,4,2543284,4,1618173,1, 
                        NULL, 
                        2,1006327204,4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        2543285,4,2543284,1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1618173,2,2543284,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000852813,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        2,1000852813,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000242383,2,1000242383,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000242383,4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000847636,2,1000847636,1, 
                        NULL, 
                        1, 
                        NULL, 
                        2,1000847636,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000932977,2,1000932977,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000932977,10,1000847636,7,635 
            ) 
            , 
            ( 
                        1000978168,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000932977,10,1000847636,7,635 
            ) 
            , 
            ( 
                        1000307069,2,1000242383,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000242383,4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1217037,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1313183,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        7,635 
            ) 
            , 
            ( 
                        1005964327,4,2543284,4,1618173,1, 
                        NULL, 
                        4,1618173,4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        1005977057,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        1707868,2,1707868,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        4,1707868,1, 
                        NULL 
            ) 
            , 
            ( 
                        1006327204,2,1618173,4,1618173,1, 
                        NULL, 
                        2,1006327204,4,1005964327,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000863810,3,1000863810,2,1000863810,1, 
                        NULL, 
                        3,1000863810,1, 
                        NULL, 
                        1, 
                        NULL 
            ) 
            , 
            ( 
                        1000929465,3,1000863810,1, 
                        NULL, 
                        1, 
                        NULL, 
                        3,1000863810,10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000933009,3,1000863810,2,1000863810,1, 
                        NULL, 
                        3,1000863810,2,1000852813,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000941560,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            ) 
            , 
            ( 
                        1000977540,1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        1, 
                        NULL, 
                        10,1000847636,1, 
                        NULL 
            )
go

share the same name .. grouping

sql same name share grouping
SELECT customer_id, 
       id_name, 
       Dense_rank() 
         OVER( 
           ORDER BY id_name) 
FROM   cust_rel_sample 

go

share the same email address .. grouping

sql same email address share grouping
SELECT customer_id, 
       id_email, 
       Dense_rank() 
         OVER( 
           ORDER BY id_email) 
FROM   cust_rel_sample

share the same .. address .. grouping

sql same address share grouping
SELECT customer_id, 
       id_address, 
       Dense_rank() 
         OVER( 
           ORDER BY id_address) 
FROM   cust_rel_sample

#11

@harishgg1 yes, this will get you the distinct customers that share the same attributes .. i.e. the lowest level group.

Next stage would be to create higher level groups where customers exist in multiple groups, and link these together ..


#14

hi

i took a look at the screenshot you provided
for desired result

I came up with the SQL
Please check and let me know
Thanks
:slight_smile:
:slight_smile:

SQL ...
;WITH cte 
     AS (SELECT a.customer_id, 
                2 AS grp 
         FROM   cust_rel_sample a 
                JOIN cust_rel_sample b 
                  ON a.id_name = b.id_name 
         UNION 
         SELECT a.customer_id, 
                3 AS grp 
         FROM   cust_rel_sample a 
                JOIN cust_rel_sample b 
                  ON a.id_email = b.id_email 
         UNION 
         SELECT a.customer_id, 
                1 AS grp 
         FROM   cust_rel_sample a 
                JOIN cust_rel_sample b 
                  ON a.id_address = b.id_address) 
SELECT * 
FROM   cte 

go
Image Results ..


#16

DROP TABLE dbo.[cust_rel_sample]
CREATE TABLE dbo.[cust_rel_sample]
(
[customer_id] [int] NULL,
[cnt_name] [int] NULL,
[id_name] [int] NULL,
[cnt_bank] [int] NULL,
[id_bank] [bigint] NULL,
[cnt_mobile] [int] NULL,
[id_mobile] [int] NULL,
[cnt_home] [int] NULL,
[id_home] [int] NULL,
[cnt_email] [int] NULL,
[id_email] [int] NULL,
[cnt_address] [int] NULL,
[id_address] [int] NULL
)
GO
;

INSERT INTO dbo.[cust_rel_sample]
VALUES
(1035608,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1285215,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1618173,2,1618173,4,1618173,1,NULL,4,1618173,1,NULL,1,NULL),
(2091361,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(2543284,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000875447,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000941758,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000941737,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000962860,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,1,NULL),
(1000242402,2,1707868,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1006384494,4,2543284,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(2543285,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000852813,1,NULL,1,NULL,1,NULL,1,NULL,2,1000852813,1,NULL),
(1000242383,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1000847636,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000932977,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000978168,1,NULL,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000307069,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1217037,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1313183,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1005964327,4,2543284,4,1618173,1,NULL,4,1618173,4,1005964327,1,NULL),
(1005977057,1,NULL,1,NULL,1,NULL,1,NULL,4,1005964327,1,NULL),
(1707868,2,1707868,1,NULL,1,NULL,1,NULL,4,1707868,1,NULL),
(1006327204,2,1618173,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(1000863810,3,1000863810,2,1000863810,1,NULL,3,1000863810,1,NULL,1,NULL),
(1000929465,3,1000863810,1,NULL,1,NULL,3,1000863810,10,1000847636,1,NULL),
(1000933009,3,1000863810,2,1000863810,1,NULL,3,1000863810,2,1000852813,1,NULL),
(1000941560,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000977540,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL)

;with cte as(
select customer_id,id_name as id,DENSE_RANK() over(order by id_name) as cnt from dbo.[cust_rel_sample]
where id_name is not null
union all
select customer_id,id_email as id,DENSE_RANK() over(order by id_email) as cnt from dbo.[cust_rel_sample]
where id_email is not null
union all
select
customer_id,id_address as id,DENSE_RANK() over(order by id_address) as cnt from dbo.[cust_rel_sample]
where id_address is not null
union all
select customer_id,id_bank as id,DENSE_RANK() over(order by id_bank) as cnt from dbo.[cust_rel_sample]
where id_bank is not null
union all
select customer_id,id_mobile as id,DENSE_RANK() over(order by id_mobile) as cnt from dbo.[cust_rel_sample]
where id_mobile is not null
union all
select customer_id,id_home as id,DENSE_RANK() over(order by id_home) as cnt from dbo.[cust_rel_sample]
where id_home is not null)

select customer_id,count(customer_id)
from cte
group by customer_id


#17

Thanks @mannesravya & @harishgg1 - both suggestions make a start. We can now identify groups of customers with similar attributes.

If we go back to the original problem statement

customer A is linked to customer B and B is linked to C (there is no link between A and C).
I'm looking for A, B and C to be linked by a common join_id

Your queries allow me to create a group for A and B and another group for B and C.

Next step is to link the first group with the second group to create an overall group containing A, B and C.


#18

I am little bit confused. Could you please explain that with real columns in your table.


#19

hi jasebase

could you please give some sort of illustriation
so that we can
understand

example like below

image


#20

OK - I'll take one customer example through a manual approach to partially achieve what I'm looking for.

.

Build test data

code
USE [test]
GO

-- DROP TABLE dbo.[cust_rel_sample]
CREATE TABLE dbo.[cust_rel_sample]
(
[customer_id] [int]	NULL,
[cnt_name] [int]	NULL,
[id_name] [int]	NULL,
[cnt_bank] [int]	NULL,
[id_bank] [bigint]	NULL,
[cnt_mobile] [int]	NULL,
[id_mobile] [int]	NULL,
[cnt_home] [int]	NULL,
[id_home] [int]	NULL,
[cnt_email] [int]	NULL,
[id_email] [int]	NULL,
[cnt_address] [int]	NULL,
[id_address] [int]	NULL
)
GO
;

INSERT INTO dbo.[cust_rel_sample]
VALUES
(1035608,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1285215,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1618173,2,1618173,4,1618173,1,NULL,4,1618173,1,NULL,1,NULL),
(2091361,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(2543284,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000875447,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000941758,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000941737,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000962860,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,1,NULL),
(1000242402,2,1707868,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1006384494,4,2543284,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(2543285,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000852813,1,NULL,1,NULL,1,NULL,1,NULL,2,1000852813,1,NULL),
(1000242383,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1000847636,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000932977,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000978168,1,NULL,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000307069,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1217037,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1313183,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1005964327,4,2543284,4,1618173,1,NULL,4,1618173,4,1005964327,1,NULL),
(1005977057,1,NULL,1,NULL,1,NULL,1,NULL,4,1005964327,1,NULL),
(1707868,2,1707868,1,NULL,1,NULL,1,NULL,4,1707868,1,NULL),
(1006327204,2,1618173,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(1000863810,3,1000863810,2,1000863810,1,NULL,3,1000863810,1,NULL,1,NULL),
(1000929465,3,1000863810,1,NULL,1,NULL,3,1000863810,10,1000847636,1,NULL),
(1000933009,3,1000863810,2,1000863810,1,NULL,3,1000863810,2,1000852813,1,NULL),
(1000941560,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000977540,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL)

Lets isolate 1 customer_id, and get all the group linking_id's it may be a member of

code
SELECT 
		customer_id
		, id_name
		, id_bank
		, id_mobile
		, id_home
		, id_email
		, id_address
INTO 
		#a		-- DROP TABLE #a
FROM
		dbo.[cust_rel_sample]
WHERE 
		customer_id = 1000863810

Now, lets join this linking data back to the original data-set to get all other customers associated with any group that was linked to the original customer_id

code
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, a.customer_id			AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
INTO 
		#b		-- DROP TABLE #b
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#a a
ON		data.id_name =		a.id_name
	OR	data.id_bank =		a.id_bank
	OR	data.id_mobile =	a.id_mobile
	OR	data.id_home =		a.id_home
	OR	data.id_email =		a.id_email
	OR	data.id_address	=	a.id_address

Now, we have a list of customer_id's that were linked to the original customer_id. Next step is to get any customer_id's linked to these new linked customer_id's

code
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, b.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
INTO
		#c	-- DROP TABLE #c
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#b b
ON		data.id_name =		b.linked_id_name	
	OR	data.id_bank =		b.linked_id_bank
	OR	data.id_mobile =	b.linked_id_mobile
	OR	data.id_home =		b.linked_id_home
	OR	data.id_email =		b.linked_id_email
	OR	data.id_address	=	b.linked_id_address

OK - now repeat the process to get any associated customer_id's

code
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, c.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
INTO
		#d	-- DROP TABLE #d
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#c c
ON		data.id_name =		c.linked_id_name	
	OR	data.id_bank =		c.linked_id_bank
	OR	data.id_mobile =	c.linked_id_mobile
	OR	data.id_home =		c.linked_id_home
	OR	data.id_email =		c.linked_id_email
	OR	data.id_address	=	c.linked_id_address

... and again

code
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, d.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
INTO
		#e	-- DROP TABLE #e
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#d d
ON		data.id_name =		d.linked_id_name	
	OR	data.id_bank =		d.linked_id_bank
	OR	data.id_mobile =	d.linked_id_mobile
	OR	data.id_home =		d.linked_id_home
	OR	data.id_email =		d.linked_id_email
	OR	data.id_address	=	d.linked_id_address

... and again

code
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, e.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
INTO
		#f	-- DROP TABLE #f
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#e e
ON		data.id_name =		e.linked_id_name	
	OR	data.id_bank =		e.linked_id_bank
	OR	data.id_mobile =	e.linked_id_mobile
	OR	data.id_home =		e.linked_id_home
	OR	data.id_email =		e.linked_id_email
	OR	data.id_address	=	e.linked_id_address

And now call the results. This will give me all customers linked to other customers (... linked to other customers .. and so on) that start with a seed customer_id of 1000863810. It then groups these as being a part of join_id = 1

code
SELECT 
		DISTINCT linked_customer_id
		, 1			 AS join_id
FROM 
		#f

Using this approach, the next step would be to update the source data-set with this new derived join_id and then get the next customer_id in my original data-set that does not have a join_id, and repeat the process again assigning another join_id to the output..

This is clearly an inefficient way of approaching the problem (my data is c. 2 million records)

  • several individual steps
  • approaches the issue on a record by record basis (there may not be a way around this)
  • inefficient joins

I hope this helps clarify what I'm trying to achieve. Any suggestions on a better way to approach this query would be much appreciated.


#21

Does anyone have any suggestion on a better solution to the problem?


#22

HI

please take a look at ...GROUPING SETS ...
hope it helps
:slight_smile:
:slight_smile:


#23

hi

i tried an approach where

i created a final_output table
drop table Final_Output
go
create table Final_Output
(
[customer_id] [int] NULL,
[join_id] [int] NULL
)
go

Each Step
I added a grp field
image
image

Each step
I am inserting
linked customer id and grp into final output table


At the end I just select from the Final_output table

Final Ouput table ... Results

image

Does this help ????????
:slight_smile:
:slight_smile:

SQL .. Whole
drop table Final_Output 
go 
create table Final_Output 
(
[customer_id] [int]	NULL,
[join_id] [int] NULL
)
go 

DROP TABLE dbo.[cust_rel_sample]
CREATE TABLE dbo.[cust_rel_sample]
(
[customer_id] [int]	NULL,
[cnt_name] [int]	NULL,
[id_name] [int]	NULL,
[cnt_bank] [int]	NULL,
[id_bank] [bigint]	NULL,
[cnt_mobile] [int]	NULL,
[id_mobile] [int]	NULL,
[cnt_home] [int]	NULL,
[id_home] [int]	NULL,
[cnt_email] [int]	NULL,
[id_email] [int]	NULL,
[cnt_address] [int]	NULL,
[id_address] [int]	NULL
)
GO
;

INSERT INTO dbo.[cust_rel_sample]
VALUES
(1035608,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1285215,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1618173,2,1618173,4,1618173,1,NULL,4,1618173,1,NULL,1,NULL),
(2091361,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(2543284,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000875447,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000941758,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000941737,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000962860,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,1,NULL),
(1000242402,2,1707868,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1006384494,4,2543284,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(2543285,4,2543284,1,NULL,1,NULL,4,1618173,2,2543284,1,NULL),
(1000852813,1,NULL,1,NULL,1,NULL,1,NULL,2,1000852813,1,NULL),
(1000242383,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1000847636,2,1000847636,1,NULL,1,NULL,2,1000847636,10,1000847636,1,NULL),
(1000932977,2,1000932977,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000978168,1,NULL,1,NULL,1,NULL,3,1000932977,10,1000847636,7,635),
(1000307069,2,1000242383,1,NULL,1,NULL,3,1000242383,4,1707868,1,NULL),
(1217037,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1313183,1,NULL,1,NULL,1,NULL,1,NULL,1,NULL,7,635),
(1005964327,4,2543284,4,1618173,1,NULL,4,1618173,4,1005964327,1,NULL),
(1005977057,1,NULL,1,NULL,1,NULL,1,NULL,4,1005964327,1,NULL),
(1707868,2,1707868,1,NULL,1,NULL,1,NULL,4,1707868,1,NULL),
(1006327204,2,1618173,4,1618173,1,NULL,2,1006327204,4,1005964327,1,NULL),
(1000863810,3,1000863810,2,1000863810,1,NULL,3,1000863810,1,NULL,1,NULL),
(1000929465,3,1000863810,1,NULL,1,NULL,3,1000863810,10,1000847636,1,NULL),
(1000933009,3,1000863810,2,1000863810,1,NULL,3,1000863810,2,1000852813,1,NULL),
(1000941560,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL),
(1000977540,1,NULL,1,NULL,1,NULL,1,NULL,10,1000847636,1,NULL)


select * from cust_rel_sample
go 


DROP TABLE #a
go 

SELECT 
		customer_id
		, id_name
		, id_bank
		, id_mobile
		, id_home
		, id_email
		, id_address
		,1 grp 
INTO 
		#a		
FROM
		dbo.[cust_rel_sample]
WHERE 
		customer_id = 1000863810


insert into Final_Output select customer_id,grp  from #a 

--select * from  #a 
--go 


drop table #b 
go 
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, a.customer_id			AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
		,2 as grp
INTO 
		#b		-- DROP TABLE #b
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#a a
ON		data.id_name =		a.id_name
	OR	data.id_bank =		a.id_bank
	OR	data.id_mobile =	a.id_mobile
	OR	data.id_home =		a.id_home
	OR	data.id_email =		a.id_email
	OR	data.id_address	=	a.id_address

insert into Final_Output select linked_customer_id,grp  from #b

--select * from #b 
--go 


drop table #c 
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, b.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
		,3 as grp 
INTO
		#c	-- DROP TABLE #c
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#b b
ON		data.id_name =		b.linked_id_name	
	OR	data.id_bank =		b.linked_id_bank
	OR	data.id_mobile =	b.linked_id_mobile
	OR	data.id_home =		b.linked_id_home
	OR	data.id_email =		b.linked_id_email
	OR	data.id_address	=	b.linked_id_address

insert into Final_Output select linked_customer_id,grp from #c 
--select * from #c 
--go 


drop table #d 
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, c.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
		,5 as grp
INTO
		#d	-- DROP TABLE #d
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#c c
ON		data.id_name =		c.linked_id_name	
	OR	data.id_bank =		c.linked_id_bank
	OR	data.id_mobile =	c.linked_id_mobile
	OR	data.id_home =		c.linked_id_home
	OR	data.id_email =		c.linked_id_email
	OR	data.id_address	=	c.linked_id_address

insert into Final_Output select linked_customer_id,grp from #d 
--select * from #d 
--go 


drop table #e 
go 
SELECT 
		DISTINCT
		data.customer_id		AS linked_customer_id
		, d.linked_customer_id	AS parent_customer_id
		, data.id_name			AS linked_id_name	
		, data.id_bank			AS linked_id_bank
		, data.id_mobile		AS linked_id_mobile
		, data.id_home			AS linked_id_home
		, data.id_email			AS linked_id_email
		, data.id_address		AS linked_id_address
		,6 as grp
INTO
		#e	-- DROP TABLE #e
FROM
		dbo.[cust_rel_sample] data
JOIN	
		#d d
ON		data.id_name =		d.linked_id_name	
	OR	data.id_bank =		d.linked_id_bank
	OR	data.id_mobile =	d.linked_id_mobile
	OR	data.id_home =		d.linked_id_home
	OR	data.id_email =		d.linked_id_email
	OR	data.id_address	=	d.linked_id_address

insert into Final_Output select linked_customer_id,grp  from #e 
--select * from #e 
--go 


select * from Final_Output
go