It's a principles question, so I don't post an example query:
I had this query involving joins from 6 subqueries. It performed very poor so I decided to split it into temporary tables: instead of Inner Join (Select...) I first created 6 temporary tables and I populated them using exactly the same query as the according subqueries.
The result was something like Select xxx from #temp1 Inner join #temp2 ... etc...
Comparing with the first case (with subqueries), which took 15 minutes, the second one (with temporary tables) took 35 seconds.
I don't understand why the server is so much less efficient when it uses subqueries.
My question to you, experts, is: Is the usage of temporary tables a better way to optimise queries?
As always, it depends. Did you factor in the time it took to populate the temp tables? I assume the temp tables only contained the data you were looking for and not the entire table. What was the execution plan on the query when using the tables? Temp tables write to tempdb, so there would be alot of writing to tempdb to create them. So, without any execution plan or ddl/sample data, it depends. A 15 minute query sounds inefficient to me and probably needs an index or 2.
Thank you mike01
But I don't understand what do you mean:"temp tables only contain the data you were looking for".
The queries were performed against the same database but the first one was with subqueries and in the second case I first created temporary tables (#temp1, #temp2...), populated them with the same queries as the subqueries and the select was joining on the temp tables, not on subqueries.
That's the scenario: Join on subqueries vs 1.create temp tables + 2.populate them with the same data as the subqueries + 3.join these temp tables
I stress again: the same data as the subqueries (I just copied the subquery and used it for the Insert into #temp1 Select [subquery here]
Thank you for answering.
so your temp tables had the same number of rows as the main tables?
Instead of :Select * From (Select 1.....) Inner Join (Select 2...) On ...
Create Table #temp1...
Insert Into #temp1 [Select1...]
Create Table #temp2
Insert Into #temp2 [Select2...]
And Then: Select * From #temp1 Inner Join #temp2 ...
i.e. I prepare the subqueries into temp tables and then I make the main query. And it's a lot faster, even if that means it must create tables, etc...
the subquery is returning a subset of the whole table (I'm assuming). Does select count(1) from #Temp1 return the same number of records as select count(1) from table in subselect without any criteria? Also, you haven't addressed the execution plan. Have you looked at that?
Compare the query plans of both. It's likely that the first query -- the one with subqueries -- had wrong row estimates, causing SQL to use a LOOP join rather than a HASH join. That's about all I can think of that would cause that big a difference in time.
Thank you Scott, mike,
It baffles me that huge difference between the two solutions. It might be indeed a wrong row estimate leading to wrong decisions. The biggest table involved has in excess of 5 million records and the result is around 10,000 records.
I am interested in an explication to use the best approach in the future (even if the temp table wouldn't have been my first guess).
The main problem is using temp tables doesn't allow me to use Sql functions, only stored procedures that can create those tables.
Thank you again
(I didn't look to the execution plan since I'm not very good at interpreting it .... but I'll try)
Another consideration is the statistics on the source tables, they could be out of date, or the cardinality (# of rows) is affecting the row estimates, as @ScottPletcher mentioned. Putting the data into temp tables not only reduces the cardinality but generates fresh statistics.
FYI table variables do not create statistics, and unless you're on SQL Server 2017 or higher, they will always mis-estimate row counts (might even require SQL 2019) to the query optimizer.
It is also possible that the subqueries are modified by SQL and the filtering is moved to a later point in the execution plan. Breaking each subquery into a separate temp table therefore reduces the number of rows being joined.
The answer to the OP's question:
Is a qualified no - it may be better for that query but not better for a different query.
You could force SQL to not use loop joins: if the reasons for bad performance are consistent, that might clear up the problem.
SELECT ... FROM dbo.main_table INNER JOIN (...) AS subquery1 INNER JOIN (...) AS sq2 ...
OPTION ( HASH MERGE JOIN )
Thank you Jeff for the answer!
Scott, I will definitely try that syntax, thank you.