I have a pretty active database that contains weather information from all around the world. The table in question has about a million rows and is constantly being updated.  I only keep data for 72 hours.  The stored procedure that is giving me trouble gets executed 10-20 times a second. After about 30 minutes my cpu spikes up to 100% and stays there. I've discovered that if I include "with recompile" in the sproc then the cpu stays down around 45%. This seems contrary to what I would expect in regards to query plan reuse. Why would forcing a recompile every time be better? The table's statistics get refreshed each evening.  Any ideas? Are my statistics getting stale that quickly?
-Jeff
How do you (mostly) query the table?
What are the clustering keys on the table, and do they match the lookup? The clustering keys are absolutely the most critical for performance and should generally match the way you most frequently look up data [that's a very broad statement which certainly has exceptions, but overall it's the proper goal].
I have a clustered index on a sequential number (integer) that gets incremented automatically. There are several non-clustered indexes that I can see in the execution plan that are being referenced when the query is run. All querying to this table is through a stored procedure. Each record has a status column that gets updated several times throughout the record's lifecycle. Unfortunately this status is one of the primary values that gets queried on. I know I'm taking a big hit in updating my indexes when the record's status value changes and that's why I didn't include it in my clustered index. If the table were not so large and volatile (and needed to be online 24x7) I'd experiment with re-engineering the clustered index. Right now I'm stuck with the clustered index I have.
Do you most-frequently query the table on that ID? or is it more common to query the table on some other column(s)? in particular do you commonly query the table on a range-value for a non-ID Column?
Scott is the best source of advice on this, but in general I would say that if your main queries are on some other columns try moving the PKey to Non-Clustered and make the common-query column the Clustered Index (include the ID to make the Clustered Index unique, if the column(s) it contains are not naturally unique. The alternative is to let SQL add its own uniquifier, but I'm not sure if including your own ID would be in that situation)
You need to do the testing on something other than the production server ... I'm sure you know that, but you need to find / beg / borrow or better still "buy"  the kit to do that on.
 the kit to do that on.
If you have space on the production server it won't be tha awful to restore a copy of the database and experiment on that.
Probably worth checking that Update Stats is ASYNC (the default is SYNC which means that if SQL decides to dynamically update the stats then all queries wait for that to happen, whereas ASYNC means all queries continue to use the old stats until the new ones are in place). But I would expect that to only manifest itself as periodical slowdowns, not a continued slowdown.
I wonder if a filtered index(es) [one or several] would help? Particularly if your normal Status queries are only done on less-commonly-used status values.
One other trick that I have seen done (but long ago, maybe optimiser would not be fooled any more!) is
IF @Param = 1
SELECT Col1, Col2, ...
FROM MyTable
WHERE Status = @Param
      AND 1=1
END
IF @Param = 2
SELECT Col1, Col2, ...
FROM MyTable
WHERE Status = @Param
      AND 2=2
END
this allows different query plans to be created (and cached) for Status=1 and Status=2 because the SQL text is not identical for each of them.
Thank you both Scott & Kristen for pointing me in the right direction. I'll be experimenting with the clustered index this weekend. I have a test database where I can adjust indexes and queries but it isn't under the same load as our production server. I think my next step is to figure out a way to get incoming request traffic to flow to both environments so that I can test in real world conditions.
So far we haven't been able to replicate the issue in our test environment but I think that once we get our test traffic levels up where the production server is operating we will have a better understanding and test bed for these types of issues.
Great ideas! Thank you!
@kayakingJeff,
What version of SQL Server are you running?
You could just check the number of table scans and logical I/O of various queries. If that indicates "less work" of some change that you try then I would think that is probably good enough. Its a cheap & cheerful way compared to more extensive checking of Query Plans.
You would need to make tests of a representative set of SQL statements though ... but that "test suite" would be a handy thing to have anyway, in the future.
If you need something more sophisticated you can get SQL to "record" SQL statements (i.e. on the Production server) and then Play Back those on the Test Server (starting from a backup you made on Production at the start of the recording period). This approach is hard work - there will almost inevitably be some statements that are Chicken-and-Egg with others, or don't coincide exactly with the start backup that you made, and they will fail (because a related record is not present / not yet updated to be appropriate) and that then leads to a cascade of errors  but once you've got a clean test data and SQL command set it is trivial to then test various scenarios. Probably "more than an afternoons work" though, sadly.
 but once you've got a clean test data and SQL command set it is trivial to then test various scenarios. Probably "more than an afternoons work" though, sadly.
-- SET SHOWPLAN_TEXT ON	-- View the Query Plan, the SQL is not run (non destructive)
GO
-- SET STATISTICS IO ON; SET STATISTICS TIME ON	-- RUN the SQL and see Logical I/O and SCANs (destructive)
-----------------------------------------------------------
-- ... put query here ...
-----------------------------------------------------------
SET STATISTICS IO OFF; SET STATISTICS TIME OFF; SET STATISTICS PROFILE OFF
GO
SET SHOWPLAN_TEXT OFF
GO
SET SHOWPLAN_ALL OFF
GO
Start with either one of the initial SET commands (just highlight from the SET command to the end to run your test)
To answer the original question, using WITH RECOMPILE in the stored procedure can be quite a bit cheaper that suffering with a bad cached plan.
Also, your table is suffering many changes but they may not be enough to trigger a rebuilding of statistics. You might want to reconsider rebuilding certain statistics on a more frequent basis. I've just recently seen where stale statistics turned a query that usually takes less than a second into a sleeping giant that did nothing for nearly an hour before we killed the query. And it was quite repeatable. It kept misbehaving until we rebuilt the stats. Even a recompile didn't fix it.