JOIN syntax

sqlor · January 27, 2022, 12:22pm

Hello

Can you tell me please the right syntax to join subqueries?

I tried:

SELECT Col1, Col2 a
FROM Table1
WHERE Condition1 and Condition2
outer left join
SELECT Col1, Col2 b
FROM Table1
WHERE Condition1
on a.Col1=b.Col2

But it does not seem to work.

Please not I want to first generate the tables I want with the appropriate filters and AFTERWARDS join the tables. I do not want the opposite as it is not efficient. If there is another syntax for the opposite, please let me know for future use

Thanks!

Ifor · January 27, 2022, 12:54pm

ScottPletcher · January 27, 2022, 3:01pm

SELECT *
FROM (
    SELECT Col1, Col2 a
    FROM Table1
    WHERE Condition1 and Condition2
) AS a
LEFT OUTER JOIN (
    SELECT Col1, Col2 b
    FROM Table1
    WHERE Condition1
) AS b ON a.Col1=b.Col2

sqlor · January 29, 2022, 11:26pm

That is great thanks but Col1 and Col2 are both included in the output and they contain duplicate data, is there a way to remove b.Col2?

jeffw8713 · January 30, 2022, 4:48pm

Why do you think it is not efficient? It is often much more efficient to let SQL Server optimize the query than to try to do it yourself.

SELECT ...
  FROM Table1  t1
 WHERE condition1
   AND condition2

So now you want to join back to table1

SELECT ...
  FROM Table1  t1
  LEFT OUTER JOIN Table1  t2 ON t2.col2 = t1.col1
                            AND condition1  -- different condition for this reference
 WHERE condition1
   AND condition2

The only way to see if one is more efficient vs the other is to compare the execution plans.

As to your question about what columns are returned - that is what SELECT * does...it returns all columns. Since that would be the outer query and what is returned to the client you need to specify the columns.

ScottPletcher · January 31, 2022, 3:33pm

Sure. Specify the specific columns you want rather than * for the output:

SELECT a.Col1, a.Col2, B.col1
FROM ( ...

jeffw8713 · January 31, 2022, 5:51pm

For further thought - there are times when generating the tables with the appropriate filters and then joining is appropriate. But, to do that you would create temp tables and join the temp tables - this is sometimes referenced as divide & conquer.

But - using a derived table or CTE doesn't actually accomplish that goal. Because SQL Server optimizes the query into an execution plan - the filtering in the derived table or CTE can be moved to another portion of the plan as SQL Server determines the most optimal approach.

ScottPletcher · January 31, 2022, 6:23pm

There's no need for the added overhead of temp tables. Derived tables require less I/O.

SQL Server can alter the plan as it sees fit if/since SQL will only do so if it can guarantee that it gives the same results as it would from using the derived tables.

Besides, a LEFT OUTER JOIN will inherently reduce the options SQL has to satisfy the query. It must always return all results the LEFT table.

jeffw8713 · January 31, 2022, 6:46pm

Scott - yes, SQL Server can alter the plan - but that doesn't mean it will process derived tables in isolation either. The derived table is incorporated into the full query and then optimized - and in this example, since the derived table is joined using an outer join to another derived table - there is no reason to believe it will be any more efficient than referencing the tables directly and joining.

ScottPletcher · January 31, 2022, 7:01pm

The JOIN part is not about efficiency, it's about accuracy. Presumably that's why the OP wanted to separate the two tasks, the initial queries and only then a join.

A derived table will almost always be more efficient than temp tables, since you avoid having to write the data to temp tables. As you noted, SQL can design a plan for all the data inline, without having to first write it to tempdb.

jeffw8713 · January 31, 2022, 9:21pm

This goes back to the original question:

It seems to me the OP is looking at derived tables because it is more efficient. In a basic example like this - I cannot see how a derived table is going to be any more efficient than joining the tables directly.

I question the premise that derived tables are more efficient - by their nature - than directly joining the tables. Can using a derived table/CTE construct be more efficient? Possibly...

I also pointed out that if the goal is to generate the tables - and then join - it can be more efficient to use temp tables. It may not...but the only way to know which approach is better is to test and evaluate each one.

The OP stated that derived tables are more efficient - and that is why they wanted to know how to join separate derived tables.