Optimize with 'select 1'?

A developer here change several references in a Stored Proc to optimize it.
Old code:
if exists (select * from table1)...
New Code:
if exists (select 1 from table1)...

Isn't SQL Server smart enough that this kind of a change would make no difference in the performance?

There is a school of thought that if you use SELECT *, it would be less efficient because even though the query is only checking for the existence of a row, it would still read all the meta data and that you could avoid that extra step if you were to use SELECT 1. However, I recall reading in one of Itzik Ben-Gan's books that SQL Server has specifically optimized its query engine to recognize the SELECT * used in this context and to do it efficiently.

So, I would not care one way or the other, and would not spend time trying to change SELECT * to SELECT 1. I will see if I can find the reference to Itzik's book/article where he discusses this.

1 Like

From Ben-Gan's book Microsoft SQL Server 2012 T-SQL Fundamentals(https://www.amazon.com/Microsoft-Server-Fundamentals-Developer-Reference/dp/0735658145), Chapter 4 SubQueries

"Unlike most other cases, in this case it’s logically not a bad practice to use an asterisk (*) in the SELECT llist of the subquery in the context of the EXISTS predicate. The EXISTS predicate only cares about the existence of matching rows regardless of the attributes specified in the SELECT list, as if the whole SELECT clause were superfluous. The SQL Server database engine knows this, and in terms of optimization, ignores the subquery’s SELECT list. So in terms of optimization, specifying the column wildcard * in this case has no negative impact when compared to alternatives such as specifying a constant. However, some minor extra cost might be involved in the resolution process of expanding the wildcard against metadata info. But this extra resolution cost is so minor that you will probably barely notice it. My opinion on this matter is that queries should be natural and intuitive, unless there’s a very compelling reason to sacrifice this aspect of the code. I find the form EXISTS(SELECT * FROM …) much more intuitive than EXISTS(SELECT 1 FROM …). Saving the minor extra cost associated with the resolution of * is something that is not worth the cost of sacrificing the readability of the code."

The developer did some tests first then? ...

Here's hoping its only one SProc ... and that there aren't a whole load of downstream processes that will "react" to source code being changed - Peer Review, SProcs selected for next rollout patch, QA testing, etc.

2 Likes

Spot on to this a bazillion times! I'll also add that, while it's been a while since I've tested it, you have to remember that SELECT 1 is going to create a constant where SELECT * won't. It also used to be that SELECT * was an indication to the optimizer that it had free reign to use that absolute best (generally, narrowest but really only applicable for things like COUNT(*)) index where something like SELECT 1 (or any constant) might not have the same implication.

Again I'll say that it's been a very long time since I've tested such a thing and so things may have changed a bit but 1) as Kristen stated, one should test such a thing before making such wholesale changes and 2) if the change doesn't provide a high ROI improvement, it may not be worth making such a change.

Finally, remember that a "best practice" isn't necessarily the best thing to do. If enough people with the wrong idea say it enough times, people tend to label such things as a "best practice" even if the exact opposite is true.

1 Like

So that's how NOLOCK got into all that code then, eh? :toilet:

'zactly. Even worse than that, they should have simply set the transaction isolation level. :wink: (just kidding but it does show how ridiculous some advice can be).

1 Like

Good point. I believe no performance data was collected neither before nor after. Basically the customer complained it was slow. And this was as we say in American Football: a Hail Mary pass. But this fellow has the title 'Senior Software Architect'. But that's Politics as they say.

And btw, this optimization job including littering the place with 'With Recompile' statements. I can't see that doing more good than bad.

Used in the right place they will help a lot.

best I don't mention Testing again !!

1 Like

Thanks! These were done for the whole stored procedure (i.e. at the alter procedure level) and I don't think were given much thought. These stored procedures are very long so recompiling the whole thing, every time, will require overhead.

When is a good example of using it?

You could potentially use WITH RECOMPILE when there is a good possibility that the compiled plan may be inefficient for different combinations of input parameters. But even so, it should be used with much caution and testing because recompiling can be expensive.

Places where you use dynamic search conditions - like the one shown in the example below - might be a place where recompiling might help. In this example, if a query plan generated for one set of parameters is likely to be not very efficient for a different set of parameters. Even so, rather than doing the recompilation at the stored procedure level, one should consider using the OPTION (RECOMPILE) at the statement level.

CREATE PROC dbo.SearchForSomething
	@a INT,
	@b INT,
	@x VARCHAR(32)
AS
	SELECT
		a,
		b,
		x
	FROM
		TableA
	WHERE
		(@a IS NULL OR @a = a)
		AND (@b IS NULL OR @b = b)
		AND (@x IS NULL OR @x = x);
GO
1 Like

I was reading in a few places that WITH RECOMPILE is good for dynamic SQL. Here's an example:
http://stackoverflow.com/questions/20864934/option-recompile-is-always-faster-why

Is this true?

Because what I noticed is that dynamic SQL statements get compiled every time then run. Unless SQL server has the exact query string in it's execution plan cache already.

I don't think the Accepted Answer necessarily means that (and I agree with you - it doesn't make sense)

"In my experience the only time [OPTION RECOMPILE] is a viable option is when you are using dynamic SQL"

I wonder if the O/P meant dynamic SQL in the sense of "variable parameters" (which are then discussed - "if you create the query where there are 10 records in your database and then execute it when there are 100,000,000 records the cached execution plan may no longer be the most effective." )?

FWIW we use sp_ExecuteSQL in preference to EXEC in order to try to improve the likelihood of the SQL statement (not changing and) being cached.