Is this statement correct?

Hello

I have seen the statement below:

SELECT *
FROM Table1
LEFT JOIN Table2 on Table1.ID=Table2.ID and [Cancelled]='Y'

I am puzzled with the and [Cancelled]='Y' bit.

Is this really a valid way to introduce filtering in a selection?

Is this really the same as the below?

SELECT *
FROM Table1
LEFT JOIN Table2 on Table1.ID=Table2.ID
WHERE [Cancelled]='Y'

Thanks!

You need to understand the logical order of query processing. eg:

With your examples, if Cancelled is in Table2 (and you should really have written Table2.Cancelled or used an alias) then your second query will effectively convert the LEFT OUTER JOIN to an INNER JOIN as the NULLs from the LEFT OUTER JOIN will never be equal to 'Y'.

Create some test data and see what happens.

It doesn't matter if it will be converted into an INNER join.

My question is, are the two statements I wrote the same? You may assume Table1.Cancelled='Y' or Table2.Cancelled='Y'. As I saw it, it didn't distinguish and from my understanding, [Cancelled] exists in both Tables.

My real question is, if you do SELECT * FROM Table1 LEFT JOIN Table2 on [some equation] and [another equation], will the [another equation] act as a filter, i.e. as if you do SELECT * FROM Table1 LEFT JOIN Table2 on [some equation] WHERE [another equation] ?

Umm... You have a more fundamental problem.

It does if you want the query to function like it's a LEFT OUTER JOIN.

If you actually don't need a LEFT OUTER JOIN, then just use an INNER JOIN.

Thanks, is my exact question answerable? If yes, what is the answer please?
I need that information first, then I will be able to understand if it is converted in an INNER JOIN and how.

To say the same thing in other words in case it makes it more clear:

  1. When joining on two ON equations separated by AND, how is this treated by SQL?
  2. In continuation to the above, what happens if the second ON equation is an equation between a field and a value instead of two fields from the respective tables?

Thanks!

You can have many conditions in the ON portion of a join - whether that is to multiple columns or expressions such as your example. The ON clause creates the relationship between the tables identified in that join statement.

The WHERE clause filters the results of all of the joins - after the data has been joined.

In your example, you are looking to get all rows from Table1 - and match only those rows from Table2 that match the key and are canceled. So you would get every row from Table1 and the columns from Table2 would either be NULL (no match found) or have a value (matched on both key and canceled).

Now - if you move that condition to the where clause you will only get all rows that match the join criteria (matching on key only) - and then filter out any rows from the results that are canceled. For all rows from Table1 where there is no match (matched on key value - not canceled) would be excluded because the NULL value returned from the outer join cannot and never will equal 'Y'.

If the column [Cancelled] comes from Table1 and not Table2 then things get trickier...because you would still get every row from Table1...but then only match on rows from Table2 for those rows in Table1 that are canceled. Again - moving the criteria to the where returns different results...because those rows where no match occurs on the join criteria would be eliminated.

This isn't really a question about how this is treated by SQL Server - it is a fundamental question on how an RDBMS uses and defines relations and how those relationships are represented in code.

You are basically saying that:
LEFT JOIN Table2 on Table1.ID=Table2.ID AND [Cancelled]='Y' : returns results where [Cancelled] is 'Y' or NULL
And
LEFT JOIN Table2 on Table1.ID=Table2.ID WHERE [Cancelled]='Y' : returns results where [Cancelled] is 'Y' only
Right?

But I get results that are both [Cancelled]='Y' and 'N' when I do LEFT JOIN Table2 on Table1.ID=Table2.ID and [Cancelled]='Y'

This is different from what you are saying, right?

if [Cancelled]='Y' comes from Table1, then they will be the same, which they are. If it came from table2, then it would not return the null values

NO. That returns results from Table1 regardless of what is in the Table2.Cancelled column. If there was not a matching row from Table2, then SQL will force NULL into all the Table2 columns, including Cancelled, ID, etc..

Apologies, the FROM Table1 LEFT JOIN Table2 on Table1.ID=Table2.ID AND Table1.[Cancelled]='Y' : returns results where [Cancelled] is 'Y' or 'N', there is no NULL.

I still don't understand what is the function of the AND Table1.[Cancelled]='Y'.
I would also be interested to understand the function of the AND Table2.[Cancelled]='Y'

Any idea please?

Not from me. I give up. Good luck with this.

1 Like

It changes the relationship - and you get different results

Declare @table1 Table (KeyColumn int, Cancelled char(1), SomeOtherValue varchar(20));
Declare @table2 Table (KeyColumn int, Cancelled char(1), SomeOtherValue varchar(20));

 Insert Into @table1 (KeyColumn, Cancelled, SomeOtherValue)
 Values (1, 'Y', 't1-y')
      , (1, 'N', 't1-n');

 Insert Into @table2 (KeyColumn, Cancelled, SomeOtherValue)
 Values (1, 'Y', 't2-1-y')
      , (1, 'N', 't2-1-n');

 Select *
   From @table1         t1
   Left Join @table2    t2 On t2.KeyColumn = t1.KeyColumn
                          And t1.Cancelled = 'Y';

 Select *
   From @table1         t1
   Left Join @table2    t2 On t2.KeyColumn = t1.KeyColumn
                          And t2.Cancelled = 'Y';

As you can see - we get 3 rows when using Table1.Cancelled in the join but only for one row from Table1. The second row from table1 has NULL values because the Cancelled value of that row is 'N' (thus - no match in the join so no matching row from table2).

We get 2 rows using Table2.Cancelled in the join - and we get that one matching row in table2 for both rows in Table1.

If we remove the check for Cancelled - we get 4 rows returned. Row1 from Table1 matched to both rows in Table2 - and Row2 in Table1 matched to both rows in Table2.

1 Like