JOIN syntax

Hello

Can you tell me please the right syntax to join subqueries?

I tried:

SELECT Col1, Col2 a
FROM Table1
WHERE Condition1 and Condition2
outer left join
SELECT Col1, Col2 b
FROM Table1
WHERE Condition1
on a.Col1=b.Col2

But it does not seem to work.

Please not I want to first generate the tables I want with the appropriate filters and AFTERWARDS join the tables. I do not want the opposite as it is not efficient. If there is another syntax for the opposite, please let me know for future use :slight_smile:

Thanks!

1 Like
SELECT *
FROM (
    SELECT Col1, Col2 a
    FROM Table1
    WHERE Condition1 and Condition2
) AS a
LEFT OUTER JOIN (
    SELECT Col1, Col2 b
    FROM Table1
    WHERE Condition1
) AS b ON a.Col1=b.Col2
1 Like

That is great thanks but Col1 and Col2 are both included in the output and they contain duplicate data, is there a way to remove b.Col2?

Why do you think it is not efficient? It is often much more efficient to let SQL Server optimize the query than to try to do it yourself.

SELECT ...
  FROM Table1  t1
 WHERE condition1
   AND condition2

So now you want to join back to table1

SELECT ...
  FROM Table1  t1
  LEFT OUTER JOIN Table1  t2 ON t2.col2 = t1.col1
                            AND condition1  -- different condition for this reference
 WHERE condition1
   AND condition2

The only way to see if one is more efficient vs the other is to compare the execution plans.

As to your question about what columns are returned - that is what SELECT * does...it returns all columns. Since that would be the outer query and what is returned to the client you need to specify the columns.

1 Like

Sure. Specify the specific columns you want rather than * for the output:

SELECT a.Col1, a.Col2, B.col1
FROM ( ...

For further thought - there are times when generating the tables with the appropriate filters and then joining is appropriate. But, to do that you would create temp tables and join the temp tables - this is sometimes referenced as divide & conquer.

But - using a derived table or CTE doesn't actually accomplish that goal. Because SQL Server optimizes the query into an execution plan - the filtering in the derived table or CTE can be moved to another portion of the plan as SQL Server determines the most optimal approach.

There's no need for the added overhead of temp tables. Derived tables require less I/O.

SQL Server can alter the plan as it sees fit if/since SQL will only do so if it can guarantee that it gives the same results as it would from using the derived tables.

Besides, a LEFT OUTER JOIN will inherently reduce the options SQL has to satisfy the query. It must always return all results the LEFT table.

Scott - yes, SQL Server can alter the plan - but that doesn't mean it will process derived tables in isolation either. The derived table is incorporated into the full query and then optimized - and in this example, since the derived table is joined using an outer join to another derived table - there is no reason to believe it will be any more efficient than referencing the tables directly and joining.

The JOIN part is not about efficiency, it's about accuracy. Presumably that's why the OP wanted to separate the two tasks, the initial queries and only then a join.

A derived table will almost always be more efficient than temp tables, since you avoid having to write the data to temp tables. As you noted, SQL can design a plan for all the data inline, without having to first write it to tempdb.

This goes back to the original question:

It seems to me the OP is looking at derived tables because it is more efficient. In a basic example like this - I cannot see how a derived table is going to be any more efficient than joining the tables directly.

I question the premise that derived tables are more efficient - by their nature - than directly joining the tables. Can using a derived table/CTE construct be more efficient? Possibly...

I also pointed out that if the goal is to generate the tables - and then join - it can be more efficient to use temp tables. It may not...but the only way to know which approach is better is to test and evaluate each one.

The OP stated that derived tables are more efficient - and that is why they wanted to know how to join separate derived tables.