High CPU due to plan cache problems

Hi all,

Our CPU on one of our six production servers has been going to 100% some mornings. We noticed that at 4am something is causing the size of the plan cache to plummet from 7GB to about 50MB. Since the plan cache is being starved for memory, the SQL Compilations / Sec is significantly increasing causing CPU pressure.

The server has 128GB of memory. The databases on it are quite large so perhaps we just need more memory (which unfortunately means adding another server because we are on standard edition which caps the memory).

I have a couple of questions...

  1. Is there anyway to set a minimum size for the plan cache (don't let it go below 2GB)?
  2. We currently set the maximum amount of memory that SQL can use to 80% of the server total. Some sites suggest 90%. What do people here use?

Thanks.

sounds like there may be a job that is clearing the plan cache. Is there any maintenance jobs running at that time? Look for dbcc freeproccache

I'll check for that Mike. The strange thing, however, is that the cache is not growing again. You would think if something cleared the cache at 4am that it would start growing again. The fact that it isn't tells me there is ongoing memory pressure. I used that command to clear the cache on another server (less load) and the cache very quickly (in a matter of minutes) contained tens of thousands of plans. This problematic server had about 80 plans in it and it wasn't growing.

Check the SQL Server error log. Typically it will state if memory trimming occurred.

Are you on SQL 2016+? If so, are you compressing your data?

Hi Scott,

Our DBA says that the error logs do not show that trimming occurred. We are on SQL Server 2017. We compress the backups but not the data.

Thanks.

Wow, you should definitely be compressing the data (although not while under CPU pressure; but that's an unusual case, usually I/O is the issue, NOT CPU).

If you're limiting SQL to ~102GB (80% of 128), that would seem to leave enough RAM for other things, including CLR, XML, JSON, and so on.

Of course you also need to tune the db indexes, particularly determining and implement the best clustered index for each table. Hint: this is most often NOT an identity column, no matter what some experts claim. The clustered index is far too important to ever have a "default", it should always be chosen carefully based on that specific table's usage and size.

I'm not an expert on licensing, so I don't know if you could use a separate instance with 128G on the same CPUs and not have to get another license, but you might look into that too.

If "freeproccache" is run, you should messages in the SQL logs with roughly this text:

SQL Server has encountered 1 occurrence(s) of cachestore flush for ...

[That's for SQL 2016, I'm not sure for 2017, but presumably it didn't change much.]