Sql server profiler in production environment

nkotb · May 16, 2015, 5:20pm

Hi,

Since profiler is heavy and shouldn't be used in production environment how can I trace sql queries in production?

Thanks

JeffModen · May 17, 2015, 1:48am

"It Depends". What are you trying to trace the SQL queries for? The reason I ask is because the reason(s) for the traces seriously dictate the alternative method(s).

p.s. And I disagree. Profiler isn't "heavy"... the things people use it for is. Again, "It Depends" on what your looking for.

Kristen · May 17, 2015, 8:21am

Run it on a workstation, instead of the server?

Set up "tight" filters to minimise what information is included?

Our APPs are all SProc driven. Every SProc logs (to a table) its parameters and the time it starts, which is updated with the End Time and Error Number. We use that for a lot of our diagnostics, instead of SQL Profiler I suppose ...

JeffModen · May 17, 2015, 4:29pm

I'm totally jealous!

To answer your questions, I've found that a whole lot of people use SQL Profiler to find long winded code, which are supposedly the worst performance problems in the system. Frequently, those problems don't come close to being the worst problems and so it's usually a total waste of time to use SQL Profiler even if it caused zero overhead.

The latest "shinny object" that people turn to is "extended events". Extended Events is great but, like SQL Profiler, it's frequently the wrong tool to find performance problems with and people end up wasting a lot of time and effort looking for the wrong problems.

If you really want to see what's killing performance, then right click on the instance in SSMS Object Explorer, select "Reports", and then follow your nose through the "Performance - " related reports. It's especially useful if you have something that runs overnight to clear cache because you get a fresh look every day. I don't actually go out of my way to clear cache... there are some nightly jobs that are so big that they inherently do it. Yeah... I'm working on those but those aren't actually the "big problems" that I've found. I wrote some code that mimics the output of the reports that I'm talking about along with clickable returns on the overall proc that was running and the problem area in the proc. I run it once an hour 24 hours a day and save the result set. It only takes about 3 seconds to run but provides a wealth of information as to what is a problem during any given time of the day.

What normally turns out to be the most significant performance and resource usage problems are singleton procs or ad hoc queries that are used by the GUI. I had one that only took about 450 ms per run but was also doing many thousands of reads each run. It ran so often that in an 8 hour period, it consumed 8 hours of CPU time and produced 34 TRILLION bytes of memory I/O. Even memory has its limits for performance. It took me about an hour to repair the code and another hour to retest it to make sure that it was operating the same way. I got the CPU time down to sub-second and the logical reads down to just 3. The system behaved a whole lot better after fixing that. I didn't stop there, though. I have a "Continuous Improvement Program" where I find a couple of the biggest problems each month and fix them. I got the system down from an average of 40% CPU usage to just 6-8% (depending on the time of day), resolved a ton of timeouts and deadlocks, and made it so we no longer need to water cool memory. None of these problems had a single run duration of a second or more.

As for some of the alleged "Best Practices" about running server side traces instead of client side, "It Depends". I need parameters for the procs and ad hoc usage so I call up my standing RDC session into the server, fire up a very well constricted profile run, and wait for a dozen or so occurrences of the problematic code to appear and I'm out. It turns out, though, that my profiler criteria is normally so tight that it doesn't cause any problems for the server and I could leave it running (and have as a test).

On the RDC thing, a lot of people tout that as a worst practice to even RDC in, never mind leave a session running like I do but it has saved the proverbial farm a couple of times. When I first started at the current company, the code was so bad that you couldn't even log into the system. It would timeout first. Having the RDC open meant that I was already in and I was normally the only person that could do anything to resolve the problem without actually bouncing the server to clear the guns. If you don't go crazy with open stuff on the RDC, it doesn't affect the performance of the server and it doesn't use much memory (I've got 8GB reserved for the system, anyway).

Jack · May 18, 2015, 5:43am

The System Health Extended Events - that come straight "out of the box" on SQL Server 2012 and above, are very useful. They report items based on certain thresholds

Kristen · May 18, 2015, 8:08am

90%, at a guess, of our Tables are simple CRUD. Code lookups and the like. A lot of the "main" tables are that too - nothing "clever" about them. For all those we mechanically generate 100% of the code, the web pages, the maintenance forms, data edit archiving ... everything ... that just leaves us writing code for the pages where the users need "something unusual".

Its taken us decades to get to this point - although it was an objective originally what we thought we achieved then was what we thought we could, and wanted ... with the benefit of hindsight what we wanted then was what we now have!!

Its great, visit client, get spec, give their IT people our spreadsheet to describe all the tables and columns they think they want ... take that back to office, tidy up any daftness (client IT is normally "pretty good"), add column names that suit our naming convention, import it into our Meta Data and press the button to generate the application

JeffModen · May 18, 2015, 1:57pm

Awesome. That also means that you don't have any ORM code trying to compare NVARHAR values to VARCHAR columns. That turned out to a HUGE problem where I work because that makes for totally non-SARGable queries that cause a full table scan for every call... and there were thousands and thousands of such calls every 8 hours. My little "sp_ShowWorst" proc picked them all out, though. Much easier to find than to fix because they had to do a release to get things to use procs in those areas.

Now you know why I'm so jealous of your auto-CRUD generators and the fact that the people you work with realize the full benefit of CRUD procs.

Kristen · May 18, 2015, 2:08pm

Don't need/have other developers working with me ... all this mechanically generated stuff makes my productivity look very good!

The mechanically generated code doesn't have any errors of course ... well ... if it does it has the same error in every single CRUD Sproc! not just the occasional "hard to find" one here & there.

Just have to delay second visit to client for a couple of weeks, otherwise they think that one day's work can't possibly be worth the price we charge ... although they come to us in the first place because our price is "tight" compared to competitors (who are probably sending the whole lot out to 3rd world to be hand coded ...) plus our Sale Folk explain to them a) the consistency of the code and b) the cost effectiveness if (well ..."when" of course ) they need changes in future.

Are you old enough to remember a product (early 80's??) called "The Last One"? It was a program generator in MBasic - very sophisticated for its time. I went to the launch to Journalists ... some bright spark asked "If its the Last One there won't be a Version Two will there, as it can rewrite itself if it needs too" - there was a Version Two of course and when it came out it was called "The Last One Plus"

Blinking heck ... its got its own Wikipedia entry!!

JeffModen · May 18, 2015, 3:12pm

Yep, old enough to have heard about it but haven't heard about it. Heh... it sounds kind of like SQL (declarative rather than procedural) as a bit of a stretch of imaginnation.