Is this strategy robust?

sqlor · February 17, 2021, 12:00pm

Hello

I plan to use a different strategy to achieve the same result but I am worried if there will be any issues, i.e. records returned or missed that shouldn't be.

So, can you advise please if the below are 100% equal?

SELECT DISTINCT [ID]
FROM TABLE
WHERE [A]=1
AND [B]=2
AND [C]=3

SELECT DISTINCT [ID]
INTO #MYTABLE
FROM TABLE
WHERE [A]=1
AND [B]=2

SELECT DISTINCT [ID]
FROM TABLE
WHERE [ID] IN #MYTABLE
AND [C]=3

Thanks!

yosiasz · February 17, 2021, 12:22pm

You wont get the same result from any of those queries. Test them locally and you will see

sqlor · February 17, 2021, 1:16pm

That's mind boggling.

#Mytable includes only IDs where A=1 and B=2. So if we add C=3, why wouldn't it be the same as A=1 and B=2 and C=3?

Thanks

robert_volk · February 17, 2021, 3:01pm

Use these as your source data:

CREATE TABLE #table(ID int, A int, B int, C int);
INSERT #table VALUES(1,1,2,null),(1,999,888,3),(1,null,null,3);

Then run your queries. You'll see the discrepancy.

Your first query is looking for 3 conditions to be true. In the sample data I provided, there is no such row.

The second query is only testing 2 of those conditions, then limiting itself to a 4th condition (ID value). All of the sample data I provided uses the same value for ID, which effectively eliminates the other 2 conditions.

There is no enforced dependency of any relationship between these 4 conditions. Your original query is the only one that will provide the results you want. Breaking it up into separate queries can fail. Additionally it adds write activity/overhead to an otherwise read-only operation by inserting into the #myTable temp table.

I added NULLs to the source data as an additional complication. Imagine (or test) if ID is NULL.

I expect you'll reply "But ID is unique in that table" and/or "None of the columns are nullable". That wasn't specified in your original question, and even if it is true, you can still add rows that meet the 2nd condition but not the 1st.

sqlor · February 17, 2021, 6:20pm

Thanks, very helpful.

The truth is that I didn't want to do the nested selections. However, I have no other alternative. To properly analyse my data, I will need to do a double join of three large tables and then check the conditions in the resulting table.

This however takes huge processing time to complete. It is not feasible.

To give you an idea, I have two tables:
A relational table:
ID1, ID3
ID2, ID4
And a master reference table:
ID1, A
ID2, B

To be able to specify criteria for both ID1 and ID3 which are contained in the master table, I will need to join the master table twice to the relational table.

Considering that each ID row has many records, the join process becomes unfeasible (need to wait more than 15 minutes).

The work-around with the nested selections is much faster.

Is there any solution to that?