How to apply inclusion and exclusion from the same table in SQL criteria?

I have a contacts table like this:

Contacts

| CustomerID | CustomerName                       | Country |
|------------|------------------------------------|---------|
| 1          | Alfreds Futterkiste                | Germany |
| 2          | Ana Trujillo Emparedados y helados | Mexico  |
| 3          | Antonio Moreno Taquería            | Mexico  |
| 4          | Around the Horn                    | UK      |
| 5          | Berglunds snabbköp                 | Sweden  |
| 6          | Blauer See Delikatessen            | Germany |
| 7          | Blondel père et fils               | France  |
| 8          | Bólido Comidas preparadas          | Spain   |

I have another table where all customer'sprofile attributes are saved
Foreignkey is CustomerID
Relationship - one to many ...in profile table, there can be multiple rows for each customer (1 entry /per profile attribute)

CustomerProfile

| CustomerID | exclusion_type         | Flag | Effective_Date |
|------------|------------------------|------|----------------|
| 1          | DOB                    | Y    | 11/19/2022     |
| 1          | DoNotContact           | Y    | 10/25/2022     |
| 3          | AddressOnFile          | Y    | 9/13/2022      |
| 4          | SubscriptionPlanHolder | Y    | 8/8/2022       |
| 5          | DOB                    | Y    | 11/1/2022      |
| 5          | DoNotContact           | Y    | 10/2/2022      |
| 5          | SubscriptionPlanHolder | Y    | 5/1/2022       |

Desired output:
Select all customers from customer table where county is 'Germany' and include customers who got DOB on file and DOB is after Aug 2022 but exclude customers who for DoNotContact flag on their profile

Select all customers from customer table where county is 'Germany'

SELECT * FROM Contacts WHERE Country='Germany'

and include customers who got DOB on file and DOB is after Aug 2022

SELECT DISTINCT C.* 
FROM Contacts C
	INNER JOIN CustomerProfile CP
		ON CP.CustomerID=C.CustomerID
WHERE C.Country='Germany' AND
	CP.exclusion_type='DOB' AND Effective_Date >= '08/01/2022'

but exclude customers who for DoNotContact flag on their profile. I'm not using this part of SQL often so no quaranties :blush:

EXISTS (Transact-SQL) - SQL Server | Microsoft Learn

SELECT DISTINCT C.* 
FROM Contacts C
	INNER JOIN CustomerProfile CP
		ON CP.CustomerID=C.CustomerID
WHERE C.Country='Germany' AND
	CP.exclusion_type='DOB' AND Effective_Date >= '08/01/2022'  AND NOT EXISTS (SELECT C.* 
FROM Contacts C
	INNER JOIN CustomerProfile CP
		ON CP.CustomerID=C.CustomerID
WHERE 	CP.exclusion_type='DoNotContact' AND Flag='Y')

Hi @RogierPronk first of all, thank you so much for your help and guidance. I am going to try the above approach shortly.

2 Questions -

  1. Is this approach going to be good in terms of performance? I ask this because I am working with almost 20million records in thee tables.

  2. Considering each profile "exclusion_type" got it's own active flag = Y, when it comes to selecting multiple profile exclusion types - how to check in criteria that customer got flag, A,B AND C and all got active = Y (considering active flag can be N as possible value)

Like:

exclusion_type = 'DOB' and value = 'Y'               
exclusion_type  = 'DoNotContact' and value = 'Y'         
exclusion_type  = 'SubscriptionPlanHolder'  and value = 'Y'
  1. You don't have a very good design for performance. If you add exclusion_type_id it will be better, integers are better then strings. I don't know what kind of sql/dba skill level you have but you can add a calculated column to your table with something like

case when exclusion_type='DOB' and flag='Y' then 1
when exclusion_type='DoNotContact' and flag='Y' then 2
...
else 0 end
etc.

When you add an index on this field it will be fast. But this has a bad performance when you update/insert the data so be carefull.

2 You can use a CTE for each statement and combine them later:

WITH DOB AS
(
SELECT DISTINCT C.* 
FROM Contacts C
	INNER JOIN CustomerProfile CP
		ON CP.CustomerID=C.CustomerID
WHERE C.Country='Germany' AND
	CP.exclusion_type='DOB' AND Flag='Y'
), DoNotContact  AS
(
SELECT DISTINCT C.* 
FROM Contacts C
	INNER JOIN CustomerProfile CP
		ON CP.CustomerID=C.CustomerID
WHERE C.Country='Germany' AND
	CP.exclusion_type='DoNotContact' AND Flag='Y'
)
SELECT * FROM DoNotContact  DNC
INNER JOIN DOB ON (DOB.CustomerID=DNC.CustomerID)

I have a contacts table like this:

Contacts

CustomerID CustomerName Country
1 Alfreds Futterkiste Germany
2 Ana Trujillo Emparedados y helados Mexico
3 Antonio Moreno Taquería Mexico
4 Around the Horn UK
5 Berglunds snabbköp Sweden
6 Blauer See Delikatessen Germany
7 Blondel père et fils France
8 Bólido Comidas preparadas Spain

I have another table where all customer's profile attributes are saved,
Foreignkey is CustomerID

Relationship - one to many ...in profile table, there can be multiple rows for each customer (1 entry /per profile attribute...so a customer with 5 profile attributes will have 5 rows)

CustomerProfile

CustomerID profile_attribute Flag Effective_Date
1 DOB Y 11/19/2022
1 DoNotContact Y 10/25/2022
3 AddressOnFile Y 9/13/2022
4 SubscriptionPlanHolder Y 8/8/2022
5 DOB Y 11/1/2022
5 DoNotContact Y 10/2/2022
5 SubscriptionPlanHolder Y 5/1/2022

I have another table where all customer's exclusions are saved,
Foreignkey is CustomerID

Relationship - one to many ...in exclusion table, there can be multiple rows for each customer (1 entry /per exclusion)

CustomerExclusion

CustomerID exclusion_type Flag
1 BadAddress Y
1 Optout Y
3 NoCall Y
4 BadEmail Y
5 BadAddress Y

Desired output:


Select all customers from customer table where county is 'Germany' who got profile attribute 'DOB' with flag = Y and profile attribute = 'SubscriptionPlanHolder' and with flag = Y but exclude anyone with 'BadAddress' and exclude anyone with 'Optout'

So far, I got this and it seems to be giving me right output as long as I am checking only 1 profile attribute.

select aud.*
from   (select c.*, prof.profile_attribute, prof.Flag
        from Contacts c
        INNER JOIN CustomerProfile prof
        on prof.CustomerID = c.CustomerID
        where c.c_country = 'Germany' and
        prof.profile_attribute = 'DOB' and prof.Flag = 'Y') aud



left outer join
        (select CustomerID from CustomerExclusion
        where exclusion_type in ('BadAddress','Optout') and active_flag = 'Y') exc
on aud.CustomerID = exc.CustomerID       
where exc.CustomerID is null
  1. Since, I need to select customers who got profile attribute 'DOB' with respective flag = Y AND profile attribute = 'SubscriptionPlanHolder' with respective flag = Y.

How can I apply multiple conditions separated by AND when there are multiple rows /per profile attribute?

  1. I am working with 20mm+ records - is there anything different I can do to optimize the query?

Hi @RogierPronk - I have posted by updated question above and I have also posted my query that I got so far.

For reference, CTE query starting with WITH are not supported in my environment. Just plain SQL is supported.

I would appreciate if you could please guide me with my updated request above. I feel I have made some progress,

HI @harishgg1 thank you for your help.

  1. My question is why we are joining the tables with inner join?
    Inner join will give everything common between these tables except the conditions...so for example in exclusions where a customer may have multiple exclusions (1 exclusion /per row). So, if we are specifying NOT IN, other than ('BadAddress','Optout'),,,if customer got other exclusions, the output will include that, i believe.

  2. why we are doing profile_attribute IN ( ), In exclusions it makes sense as it will exclude as long as any ONE is true BUTfor profile_attribute the goal is to check BOTH flags (DOB AND SubscriptionPlanHolder) and both should have flag Y, not just one. Is it going to check availability of BOTH attributes for a customer? Just asking...do we need group by or something in this case?

If I stay as close to your query as I can it would be something like this:

select aud.*
from   (select c.*, prof.profile_attribute, prof.Flag, prof2.profile_attribute as SubscriptionPlanHolder, prof2.Flag as SubscriptionPlanHolder_Flag
        from Contacts c
        INNER JOIN CustomerProfile prof
        on prof.CustomerID = c.CustomerID
        INNER JOIN CustomerProfile prof2
        on prof2.CustomerID = c.CustomerID
        where c.c_country = 'Germany' and
        prof.profile_attribute = 'DOB' and prof.Flag = 'Y' AND
        prof2.profile_attribute = 'SubscriptionPlanHolder' and prof2.Flag = 'Y' ) aud
left outer join
        (select CustomerID from CustomerExclusion
        where exclusion_type in ('BadAddress','Optout') and active_flag = 'Y') exc
on aud.CustomerID = exc.CustomerID       
where exc.CustomerID is null

Hi @RogierPronk

New updated full query is returning 0 results.

Then, for testing I tried running just the sub query part and this sub query is returning 0

select c.*, prof.profile_attribute, prof.Flag, prof2.profile_attribute as SubscriptionPlanHolder, prof2.Flag as SubscriptionPlanHolder_Flag
        from Contacts c
        INNER JOIN CustomerProfile prof
        on prof.CustomerID = c.CustomerID
        INNER JOIN CustomerProfile prof2
        on prof2.CustomerID = c.CustomerID
        where c.c_country = 'Germany' and
        prof.profile_attribute = 'DOB' and prof.Flag = 'Y' AND
        prof2.profile_attribute = 'SubscriptionPlanHolder' and prof2.Flag = 'Y'

I think, the issue is coming from this part

prof.profile_attribute = 'DOB' and prof.Flag = 'Y' AND
        prof2.profile_attribute = 'SubscriptionPlanHolder' and prof2.Flag = 'Y'

My guess is....the issue is coming from the way data is available in the profile attributes table. The profile_attribute field is only ONE and in query we are trying to select twice with 2 different values. Do you think I need to do some kind of grouping in this case?

Again requirement is still to pick customers who fulfill BOTH profile attributes confirm (NOT either one).

Could you please guide me?

Hi @harishgg1 the query is not returning desired output. I think joining all 3 tables with inner join is not producing correct output

Select all customers from customer table where county is 'Germany' who got profile attribute 'DOB' with flag = Y and profile attribute = 'SubscriptionPlanHolder' and with flag = Y but exclude anyone with 'BadAddress' and exclude anyone with 'Optout'

If I read this then a customer should have attribute 'DOB' with flag = Y and 'SubscriptionPlanHolder' and with flag = Y both. Is this correct or do you mean.

Select all customers from customer table where county is 'Germany' who got profile attribute 'DOB' with flag = Y OR profile attribute = 'SubscriptionPlanHolder' and with flag = Y but exclude anyone with 'BadAddress' and exclude anyone with 'Optout'.

Can you give an output based on the data you provided and why you expect this result?

Another option that seems to provide correct results:

-- set up test data, can skip this part
drop table if exists #contact,#profile,#exclude
create table #contact(CustomerID int not null primary key
,CustomerName nvarchar(128) not null unique
,Country varchar(32) not null)

insert #contact values(1,N'Alfreds Futterkiste','Germany')
,(2,N'Ana Trujillo Emparedados y helados,','Mexico')
,(3,N'Antonio Moreno Taquería','Mexico')
,(4,N'Around the Horn','UK')
,(5,N'Berglunds snabbköp','Sweden')
,(6,N'Blauer See Delikatessen','Germany')
,(7,N'Blondel père et fils','France')
,(8,N'Bólido Comidas preparadas','Spain')
,(99,N'Testing','Germany')

create table #profile(CustomerID int not null
,profile_attribute varchar(32) not null
,Flag char(1) not null check (Flag LIKE '[YN]')
,Effective_Date date not null)
insert #profile values(1,'DOB','Y','11/19/2022')
,(1,'DoNotContact','Y','10/25/2022')
,(3,'AddressOnFile','Y','9/13/2022')
,(4,'SubscriptionPlanHolder','Y','8/8/2022')
,(5,'DOB','Y','11/1/2022')
,(5,'DoNotContact','Y','10/2/2022')
,(5,'SubscriptionPlanHolder','Y','5/1/2022')
,(99,'DOB','Y','11/1/2022')
,(99,'SubscriptionPlanHolder','Y','5/1/2022')

create table #exclude(CustomerID int not null
,exclusion_type varchar(32) not null
,Flag char(1) not null check (Flag LIKE '[YN]'))
INSERT #exclude VALUES(1,'BadAddress','Y')
,(1,'Optout','Y')
,(3,'NoCall','Y')
,(4,'BadEmail','Y')
,(5,'BadAddress','Y')
,(99,'Optout','Y')  -- to test inclusion, comment this out, remove it, or change type to something else 
-- end test data setup

;with cte(cID) AS (
select CustomerID from #contact where Country='Germany'  -- all customers from customer table where county is 'Germany'
intersect 
select CustomerID from #profile where profile_attribute='DOB' and Flag='Y'  -- got profile attribute 'DOB' with flag = Y 
intersect 
select CustomerID from #profile where profile_attribute='SubscriptionPlanHolder' and Flag='Y'  -- and profile attribute = 'SubscriptionPlanHolder' and with flag = Y 

-- do INTERSECT first for rows you want included, before EXCEPT to exclude rows you don't want
except 
select CustomerID from #exclude where exclusion_type in('BadAddress','Optout') and Flag='Y'  -- exclude 'BadAddress' and 'Optout'
--except select CustomerID from #profile where profile_attribute='DoNotContact' and Flag='Y'  -- find DoNotContact and exclude
)
SELECT c.* FROM #contact c INNER JOIN cte z ON c.CustomerID=z.cID

I added a single test customer and profile and exclusions, as the original sample data wouldn't provide results based on these criteria:

Regarding DOB and SubscriptionPlanHolder in the profile table, since they are an AND combination, we need 1 query for each value and INTERSECT them. Edit to add: Because either exclusion BadAddress or Optout is enough to exclude them, we can use a single EXCEPT query and use IN() to get either or both rows. EXCEPT and INTERSECT will automatically remove duplicate CustomerIDs.

If you have a dynamic set of conditions for inclusion and exclusion, I don't know how well the above solution will work for you, unless you generate it via dynamic SQL.

Hi @RogierPronk

Your 1st statement is true (my desired output)

GOAL (Desired output) -
To select all customers from customer table where
Country = 'Germany'
Profile Attribute = DOB with flag = Y AND SubscriptionPlanHolder with flag = Y**
Exclude anyone having 'BadAddress' with flag = Y OR 'Optout' with flag = Y

Here is the data set again since I am not update to update above:

| Contacts   |                                    |         |
|------------|------------------------------------|---------|
| CustomerID | CustomerName                       | Country |
| 1          | Alfreds Futterkiste                | Germany |
| 2          | Ana Trujillo Emparedados y helados | Mexico  |
| 3          | Antonio Moreno Taquería            | Mexico  |
| 4          | Around the Horn                    | UK      |
| 5          | Berglunds snabbköp                 | Sweden  |
| 6          | Blauer See Delikatessen            | Germany |
| 7          | Blondel père et fils               | France  |
| 8          | Bólido Comidas preparadas          | Spain   |
| 9          | John                               | Germany |
| CustomerExclusion |                |      |
|-------------------|----------------|------|
| CustomerID        | exclusion_type | Flag |
| 1                 | BadAddress     | Y    |
| 1                 | Optout         | Y    |
| 3                 | NoCall         | Y    |
| 4                 | BadEmail       | Y    |
| 5                 | BadAddress     | Y    |
| 6                 | BadAddress     | Y    |
| 6                 | Optout         | Y    |
| 9                 | BadAddress     | N    |
| CustomerProfile |                        |      |                |
|-----------------|------------------------|------|----------------|
| CustomerID      | profile_attribute         | Flag | Effective_Date |
| 1               | DOB                    | Y    | 11/19/2022     |
| 1               | DoNotContact           | Y    | 10/25/2022     |
| 3               | AddressOnFile          | Y    | 9/13/2022      |
| 4               | SubscriptionPlanHolder | Y    | 8/8/2022       |
| 5               | DOB                    | N    | 11/1/2022      |
| 5               | DoNotContact           | Y    | 10/2/2022      |
| 5               | SubscriptionPlanHolder | Y    | 5/1/2022       |
| 6               | DOB                    | Y    | 10/27/2021     |
| 6               | SubscriptionPlanHolder | Y    | 11/11/2022     |
| 6               | AddressOnFile          | N    | 10/1/2022      |
| 9               | DOB                    | Y    | 9/9/2022       |
| 9               | SubscriptionPlanHolder | N    | 8/19/2022      |

Expectation is to get Customer with ID#6 in the output
6 | Blauer See Delikatessen | Germany

I'm not sure what environment you are using but with t-sql you can you a table twice and give them different aliases. Based on the data you have given I can run this query and get result. I don't know if I can help you any further then I've done:

/* set up test data, can skip this part */

drop table if exists #contact,#profile,#exclude

create table #contact(CustomerID int not null primary key
,CustomerName nvarchar(128) not null unique
,Country varchar(32) not null)

insert #contact values
(6,N'Blauer See Delikatessen','Germany');

create table #profile(CustomerID int not null
,profile_attribute varchar(32) not null
,Flag char(1) not null check (Flag LIKE '[YN]')
,Effective_Date date not null)
insert #profile values(6,'DOB','Y','10/27/2021')
,(6,'SubscriptionPlanHolder','Y','11/11/2022')

select c.*, prof.profile_attribute, prof.Flag, prof2.profile_attribute as SubscriptionPlanHolder, prof2.Flag as SubscriptionPlanHolder_Flag
        from #contact c
        INNER JOIN #profile prof
        on prof.CustomerID = c.CustomerID
        INNER JOIN #profile prof2
        on prof2.CustomerID = c.CustomerID
        where c.country = 'Germany' and
        prof.profile_attribute = 'DOB' and prof.Flag = 'Y' AND
        prof2.profile_attribute = 'SubscriptionPlanHolder' and prof2.Flag = 'Y';

The result is this:

CustomerID CustomerName Country profile_attribute Flag SubscriptionPlanHolder SubscriptionPlanHolder_Flag
6 Blauer See Delikatessen Germany DOB Y SubscriptionPlanHolder Y

The only thing I can think of is that c.c_country should be c.country instead.

Hi @RogierPronk so are you suggesting in order to target same field / column twice (to check 2 values - DOB and SubscriptionPlanHolder), the idea is to join the same profile attribute table twice?

So, if I need to check not 2 values but 5, does that mean do the INNER JOIN 5 times with same table but call out the field with different alias for each condition?

yes, your correct about that, you can even join like this:

INNER JOIN #profile prof
on prof.CustomerID = c.CustomerID and
prof.profile_attribute = 'DOB' and prof.Flag = 'Y'
INNER JOIN #profile prof2
on prof2.CustomerID = c.CustomerID AND
prof2.profile_attribute = 'SubscriptionPlanHolder' and prof2.Flag = 'Y'

WITH LastProfiles
AS
(
	SELECT CustomerID, ProfileAttribute, Flag
		,ROW_NUMBER() OVER (PARTITION BY CustomerID, ProfileAttribute ORDER BY Effective_Date DESC) AS rn
	FROM dbo.CustomerProfile
)
,RequiredProfiles
AS
(
	SELECT CustomerID
	FROM LastProfiles
	WHERE Flag = 'Y'
		AND rn = 1
		AND ProfileAttribute IN ('DOB','SubscriptionPlanHolder')
	GROUP BY CustomerID
	HAVING COUNT(*) = 2
)
SELECT C.CustomerID, C.CustomerName, C.Country
FROM dbo.Contacts C
	JOIN RequiredProfiles P
		ON C.CustomerID = P.CustomerID
WHERE NOT EXISTS
(
	SELECT 1
	FROM dbo.CustomerExclusion X
	WHERE X.CustomerID = C.CustomerID
		AND X.Flag = 'Y'
		AND X.exclusion_type IN ('BadAddress','Optout')
);

Just curious - If this is going to be an efficient approach (join /per field) when I am working with 20million+records in contacts table and profile table?

Thank you @Ifor for sharing,

WITH queries (CTE) is unfortunately not supported in the environment that I am working with. Only SQL.

Do you have any recommendation without WITH approach?

SELECT C.CustomerID, C.CustomerName, C.Country
FROM dbo.Contacts C
	JOIN
	(
		SELECT L.CustomerID
		FROM
		(
			SELECT CustomerID, ProfileAttribute, Flag
				,ROW_NUMBER() OVER (PARTITION BY CustomerID, ProfileAttribute ORDER BY Effective_Date DESC) AS rn
			FROM dbo.CustomerProfile	
		) L
		WHERE L.Flag = 'Y'
			AND L.rn = 1
			AND L.ProfileAttribute IN ('DOB','SubscriptionPlanHolder')
		GROUP BY L.CustomerID
		HAVING COUNT(*) = 2
	) P
		ON C.CustomerID = P.CustomerID
WHERE NOT EXISTS
(
	SELECT 1
	FROM dbo.CustomerExclusion X
	WHERE X.CustomerID = C.CustomerID
		AND X.Flag = 'Y'
		AND X.exclusion_type IN ('BadAddress','Optout')
);