Getting NOLOCK removed from 3rd party code

Kristen · August 5, 2016, 3:16pm

I want all 3rd party code used here to be free from NOLOCK hints.

I have an Alert on "Error 601 - Data movement during NOLOCK", and I get several of those a day. I presume that for every Alert that I get there are many?? more that do not raise error but ARE subject to data movement in the Index and, without any error or indication, either fail to display some data rows, or duplicate some rows (i.e. Index Page Split has part of the page missing when the NOLOCK hint query runs, or the query sees all the rows from the index page AND also the rows from the split-page)

I have raised this with the 3rd party vendors as an issue. Predictably I heard nothing in response! So today I told the Vendors that I require them to give me a timetable for removal of all NOLOCK from their code, or to tell me that they cannot do that (in which case there is a real possibility that we will move to a different vendor).

I mentioned the Error 601 Alerts that I am getting and they said "What systems are they originating from?". I'm very happy to help the Vendors find a solution to the problem, but I have never seen anything that will give me a handle on WHICH queries with NOLOCK hints result in Data Movement errors (and even better would be any query with NOLOCK hint that suffered from a page split but did not trigger a 601 error).

I'd appreciate any suggestions for how I might find this needle-in-a-haystack.

Background (although I doubt you need it )

To me that NOLOCK is unacceptable as it suggests that for all the Alerts I get the program aborted (hopefully cleanly!) but also indicates that there will be a much?? higher number of successful uses, with no abort, where there was an index page split but which did not cause the program to abort (but DID either fail to display some rows [missing because of the index split] or include some rows more than once [present during the query, and present a second time on the freshly split index page]).

It is totally unacceptable to me that our users may not see some data rows on a perfectly innocuous display / reporting APP (because an index split meant that those missing rows were in the process of being moved, and were missed from the SQL Query because it used the NOLOCK hint). A user could make a critical business decision based on the [inaccurate] information displayed to them, and because the issue cannot be tracked, logged or reproduced, it will be impossible to prove that the Vendor's negligence was the cause.

JeffModen · August 6, 2016, 3:00pm

Hi Kristen,

We've only just migrated from 2005 to 2012 so I've not yet looked into using Extended Events (EE) for such a thing. Rumor has it that EE in 2008 was such a PITA that even some of the EE zealots recommend still using SQL Profiler in 2008. Equal rumors say that EE is nearly as easy as SQL Profiler as of 2012.

In SQL Profiler, you'd set up a trace to capture the following events under the "Errors and Warnings" category. From BOL...

Attention

The Attention event class indicates that an attention event, such as cancel, client-interrupt requests, or broken client connections, has occurred. Cancel operations can also be seen as part of implementing data access driver time-outs.

Exception (this is the event that will probably capture the errors cause by the NOLOCKs, IMHO)

The Exception event class indicates that an exception has occurred in SQL Server.

User Error Message (just looking for "bonus" data here)

The User Error Message event class displays the error message as seen by the user in the case of an error or exception. The error message text appears in the TextData field.

For short runs (5 minutes or less?), it may be that you can do this through the GUI with the understanding that it could slow things down a bit. These categories aren't "aggressive" in nature but I'd be remiss if I didn't bring up the possibility.

If you need longer runs, you should probably setup a "server side trace" using SQL Profiler and make sure that it's got an "end time" to make it self cancelling. Of course, it would also be setup to write to either a trace file (the recommended method, which can be played back in SQL Profiler) or to a trace table (more convenient but claims are that it's a bit slower).

The cool part about these events is that they'll not only identify a lot of what the 3rd party is asking for but they'll usually also provide the actual code that caused the event to fire and can be found in the TextData column.

JeffModen · August 6, 2016, 3:12pm

p.s. I agree. The use of WITH (NOLOCK) is a typical sign of a poorly designed database with poorly written code that causes too-long blocking and people try to make up for all of that by using such trickery in an attempt to "fix the code" rather than realizing that they actually need to "fix the code" and, perhaps, the underlying table design.

After the last decade of interviewing people for the senior positions of Sr. DBA and Sr. Developer and having 80% of them not even know how to get the current date and time (seriously!!! I couldn't make that up on a bet), I'm no longer surprised by rampant ineptitude when it comes to database related code).

Kristen · August 7, 2016, 10:40am

Thanks Jeff, dunno why but Extended Events hadn't occurred to me (probably because I've never had cause to use them in anger ... as yet)

I had a go with SSMS Wizard using a template ("Query Batch Tracking") which includes error_reported, rpc_completed and sql_batch_completed events. I added a few Capture Fields of interest (to better identify the user and APP and also sql_text in the hope of seeing the actual SQL Code in that Field]. The default filters for non-system-databases seemed fine.

I disabled Ring Buffer and set up a Save to File (10MB max size, file-rollover for 10 most recent files).

It's the weekend, no one is working! I did a RESTORE HEADERONLY using a wildcard filename (I have an ALERT for Failed Backup / Restore commands, and I know that will be caught ...). The Event Session ran for 45 seconds and generated 3MB of data capturing 1,288 events ...

... I modified the Extended Event Session generated by the Wizard and removed the rpc_completed and sql_batch_completed events, just leaving error_reported - seems to me that, provided that "observer overhead" is not significant, having an Extended Event Trace on all Error Events might be worthwhile ...

... I'll run it during the working day, next week, and see if it passes the Shout Test! Hopefully that will catch details of Error 601 which, although not that frequent, will enable me to find which specific SQL Code is triggering that issue, and maybe I only care about ones capable of generating an Error 601 some-of-the-time (if they are reprogrammed then, at least, they will no longer be capable of generating index-split missing-rows / duplicate-rows).

Thanks for your help

Kristen · August 7, 2016, 10:46am

Hmmm ... there might be another option, rather than / in addition to Error_Reported. NOLOCK is in the "deprecated features" list, so an Extended Events trace on deprecation_announcement (with a filter for "%NOLOCK%" too perhaps) might well find me all such code ...

JeffModen · August 7, 2016, 2:18pm

Thanks for the feedback, Kristen.

Just another suggestion, though... keep in mind that your not looking for the USE of NOLOCK... you're trying to find WHERE it causes the data movement ERROR. Or at least for a start. If you look for wherever NOLOCK is used, the application will quickly overwhelm the small amount of log space you've allowed and make it much more difficult to find where the use of NOLOCK is causing the DATA MOVEMENT ERRORs.

Heh... to use an old saying, "Keep your eye upon the donut and not upon the hole".

Kristen · August 7, 2016, 2:37pm

Yup, the error_reported extended event trace will indeed find the Error 601 (if I get too much noise I will add a filter specifically for 601) but I am worried that there are occurrences of NOLOCK which are causing missing/duplicate data [in reports] but which are never triggering a 601. I can hope that all such locations in the code will, over time, trigger a 601, but they are not very common so may take a while to reveal themselves. Pity there is no error for all such data-moved scenarios - if there were "my work would be done"

JeffModen · August 7, 2016, 4:01pm

Agreed. What really sucks about problems like this is the vendor is clueless because they're the ones that actually caused the problem to begin with. Many are arrogant enough to also think their software is "well tested" enough where such problems could not be and it becomes a nightmare to fix. Additionally, neither you nor your company are going to get paid for helping the 3rd party company figure out what's wrong with THEIR software that you paid for.

Kristen · August 8, 2016, 6:31am

Its "in my gift" to fire them though ...

... the Extended Events has shown up that OUR code </ShockHorror!> has an "Incorrect syntax near 'dbo'" error and the culprit is "SET FMTONLY ON dbo.MySprocName Param1,Param2,... SET FMTONLY OFF ". Looks like I missed the EXEC command ... that code is more than 15 years old, and must be working regardless of the syntax error ... and not to mention that SET FMTONLY is deprecated ...

It also turned up some use of TEXT / NTEXT / IMAGE data types in the 3rd party code [deprecated in NEXT SQL version, which is more serious / time-critical] which I will take up with the Vendor, plus their NOLOCK syntax is "Specifying table hints without using a WITH keyword is a deprecated feature" also deprecated in NEXT SQL Version, so a) they will have to edit them and b) I can try insisting that they remove/fix them instead!

All in all a good result, thanks for pointing me in that direction.

How do I get the data out of the LOG into something that I can usefully review? Currently I am logging to file, opening that, then I have to highlight all the rows and do a specific "Copy - Details" (regular Control-C just gets me the headers only). That only copies the first 1,000 rows ... so Rinse & Repeated ... and then Paste into Excel and massage there. A SQL report would be much easier ... I'm only wanting the DISTINCT values

Kristen · August 8, 2016, 6:57am

Still Googling ... I ought to do some real work! ... I found this :

CREATE TRIGGER HAL0002
ON DATABASE
FOR CREATE_PROCEDURE, ALTER_PROCEDURE,
    CREATE_FUNCTION, ALTER_FUNCTION,
    CREATE_VIEW, ALTER_VIEW,
    CREATE_TRIGGER, ALTER_TRIGGER
AS
    IF EVENTDATA().value('(/EVENT_INSTANCE/ObjectName)[1]',  'NVARCHAR(255)') != 'HAL0002'
    BEGIN
        IF EVENTDATA().value('(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]','nvarchar(max)') LIKE '%NOLOCK%'
        BEGIN
            PRINT 'No, you may not use NOLOCK.  No, not even for that.'
            ROLLBACK
        END
        IF EVENTDATA().value('(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]','nvarchar(max)') LIKE '%UNCOMMITTED%'
        BEGIN
            PRINT 'READ UNCOMMITTED is just as bad as NOLOCK.  No, just no.'
            ROLLBACK
        END
    END
GO

https://sqlstudies.com/2015/02/16/tales-of-a-dba-fed-up-with-nolock-part-1/

Kristen · August 8, 2016, 3:56pm

I've figured that out - I think.

Output to asynchronous file target and then read with sys.fn_xe_file_target_read_file and parse the nodes. I then append the first (not yet existing) entry into a table of "interesting values" - including filtering anything matching an additional table of wildcard "ignore patterns", thus building up a useful ToDo list over time.

JeffModen · August 8, 2016, 11:21pm

Glad to be of help.

Shifting gears a bit, I know squat about Extended Events simply because we just migrated from 2005 but, rumor has it, you shouldn't have to jump through all those hoops to build a table from the results in 2012 and up. If you're using 2008 of any flavor, then a server side trace using SQL Profiler would be the easy way to go.

Kristen · August 9, 2016, 5:59am

Yes, SQL2012 here ... but I can't see anything that will output the results to a table. I could query the EE ring-buffer thingie, presumably, but my worry / assumption was that that might miss something that got flushed out of Cache. That said, querying that every minute (if nothing gets lost) would allow me to just store "New / Diff Queries" rather than logging everything to File and then having to Import / Parse / DeDupe and Insert to table. One day's "tight" filtered output still manged to generate several GB of log files ...

Either way, I've still got the issue that the 3rd party queries are not parameterised (wheres the emoticon for "Wankers"?!) so I've got to set up a RegEx for each "popular" one to avoid storing masses of rows which are basically from one source. I thought this was going to be a 10-minute job ... looks like I'm going to wind up using Levenshtein !!!!

Kristen · August 17, 2016, 6:03am

Follow on question:

I've been musing about the idea of just removing all the NOLOCK from their code, and seeing what happens.

The APP would be very hard for (us) to test; we have no test frame work, and no suitable resource to "hammer" it, so I suppose we'd go about this by making a few tests on a QA server and the put it live and "see what happens".

I'm kinda thinking that the worst that could happen is that we get slow-down and maybe some Deadlocks. Once I had experienced those (only when operating under load I expect?) I could take a view on the next course of action.

May be I would only do this for a day and then put the original code back in place ... then Rinse and Repeat.

I guess this must have come up before with companies wanting to use an APP but wanting all the NOLOCKs "gone" ??