Impossible Insert Duplicate Error (solved:caused by Read Committed Snapshot)

I'm getting some DUPs on Insert for one particular user in a piece of code that has been unchanged for a decade or more. The table is intended to be very high volume insert, but this particular client only has light use and probably most of the time, at present, just one user.

The user is tech-savvy and I asked if they were doing "double click" or somesuch that might trigger duplicate insert (attempts) in quick succession, but that seems very unlikely - that user is frequently generating duplicates, this only started happening recently, and impossible that they are double-clicking every time - perhaps one or twice out of frustration that something was slow, but even that would be "two clicks" rather than a double-click

Possible that the STATs are stale (I haven't updated them in case I should check that in some way first?)

If I retry the exact same Sproc call (with deliberately, newly, unique LogName) I cannot reproduce the error, so it must be a subtle timing issue.

Tracing with SQL Profiler I can see the SProc call at the exact time that the issue arose, but there is no other SProc logged at the same time, so I don't think it is a second-user / second-thread issue

Here's the code - could this generate a DUP on INSERT somehow?

CREATE PROCEDURE dbo.USP_LogName_Get
	-- Returns ID of Log Name, or -1 if error (e.g. @LogName IS NULL)
	@LogName	varchar(255)	-- Log Name to find (or create)
AS
SET NOCOUNT ON
DECLARE @ln_ID	int,		-- ID of LogName record
	@intErrNo	int,	-- Error No
	@intErrLocation	int,	-- (unique) Location of error
	@intRowCount	int	-- Row Count

	IF @LogName IS NOT NULL
	BEGIN
		-- Get ID for existing record, if any
		SELECT	@ln_ID = ln_ID
		FROM	dbo.LogName
		WHERE	ln_Name = @LogName
		SELECT	@intErrNo = @@ERROR, @intRowCount = @@ROWCOUNT
				, @intErrLocation = 1
		IF @intErrNo = 0 AND @intRowCount = 0
		BEGIN
			BEGIN TRANSACTION

			-- Insert new record (with safeguard that it does not already exist)
			INSERT INTO dbo.LogName
			(
				ln_Name
			)
			SELECT	[ln_Name] = @LogName
			WHERE	NOT EXISTS
				(
					SELECT	*
					FROM	dbo.LogName
					WHERE	ln_Name = @LogName
				)
			SELECT	@intErrNo = @@ERROR, @intRowCount = @@ROWCOUNT
					, @intErrLocation = 2, @ln_ID = scope_identity()
			-- The INSERT failed (no record created - just added by someone else?)
			-- ... retrieve that ID
			IF @intErrNo = 0 AND @intRowCount = 0
			BEGIN
				SELECT	@ln_ID = ln_ID
				FROM	dbo.LogName
				WHERE	ln_Name = @LogName
				SELECT	@intErrNo = @@ERROR, @intRowCount = @@ROWCOUNT
						, @intErrLocation = 3
			END

			COMMIT

		END
	END
	ELSE
	BEGIN
		SELECT	@intErrNo = -2, @intRowCount = 0, @intErrLocation = 4
	END
	IF COALESCE(@intErrNo, -1) <> 0 OR COALESCE(@intRowCount, -1) <> 1
	BEGIN
		-- Error occurred, including attempt to store NULL
DECLARE	@strErrMsg	varchar(255)	-- Late declaration to improve efficiency
		SELECT	@strErrMsg = 'USP_LogName_Get:Error sLogName[%s:%d/%d(%d)]',
			@ln_ID = -1	-- Return -1 to indicate Logname not valid
		RAISERROR (@strErrMsg, 16, 1, 1, @LogName, @intErrNo, @intRowCount
					, @intErrLocation)
	END

	RETURN @ln_ID

SET NOCOUNT OFF
GO

I don't think this should EVER fail [creating a DUP], is there some circumstance that I am missing?

Here's the table: (I don't know why ln_Name was not created as NOT NULL)

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO

CREATE TABLE dbo.LogName
(
	ln_EditNo smallint NULL,
	ln_CreateDt datetime NULL,
	ln_CreateUser varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	ln_UpdateDt datetime NULL,
	ln_UpdateUser varchar(4) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	ln_ID int IDENTITY(10000,1) NOT NULL,
	ln_Name varchar(255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Clustered Unique Index on ln_Name, and (non clustered) PKey on the ID (just to keep it unique)

CREATE UNIQUE CLUSTERED INDEX [IX_LogName_ln_Name] ON dbo.LogName
(
	ln_Name ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF
	, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
	 ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
	ON [PRIMARY]
GO

ALTER TABLE dbo.KK_SM_yVLN_ValidateLogName 
	ADD CONSTRAINT [PK_KK_SM_yVLN_ValidateLogName] PRIMARY KEY NONCLUSTERED 
(
	yvln_ID ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF
	, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON
	, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) 
ON [PRIMARY]
GO

No idea why there is a Trigger, other than that we "always have one" to increment the EditNo column and update the UpdateDate and UpdateUser columns

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER	dbo.UTR_LogName
ON dbo.LogName
FOR INSERT, UPDATE -- DELETE
AS
SET NOCOUNT ON
DECLARE	@dNow	datetime,
	@sUser	varchar(4)

	SET	@dNow = GetDate()
	SET	@sUser = LEFT(user_name(), 4)

	UPDATE	U
	SET
		U.ln_EditNo   = COALESCE(U.ln_EditNo, 0)+1,	-- Increment edit counter
		U.ln_CreateDt   = COALESCE(U.ln_CreateDt, @dNow),
		U.ln_CreateUser = COALESCE(U.ln_CreateUser, @sUser),
		U.ln_UpdateDt   = @dNow,
		U.ln_UpdateUser = @sUser
	FROM	dbo.LogName AS U,
		inserted AS I
	WHERE	
		U.ln_ID = I.ln_ID
GO

Here's the SQL Profiler showing the SProc call (which generated the error) at 2017-02-17 11:05:09.800 and just after that another SProc call which is logging the error, itself.

The LogName record that was added has the same create date of 2017-02-17 11:05:09.800 and there is only that one row with a Create Date between 2017-02-17 11:05:09 and 11:05:11 (which only tells us that nothing else was added at that time, not that 2x attempts were made, of course)

The immediately preceding call to that Sproc (from a similar application request) was at 2017-02-17 11:05:09.430, and was for a different LogName, so it seems unlikely that this is "double clicking" or the APP making two requests.

It might be that the outer SProc is making the request (to this LogName_Get child SProc) multiple times, but that is unlikely as this code is all unchanged in "forever" and I can;t see anywhere that is trying to do that ... and ... anyway ... wouldn't the code FailSafe even if two such requests were made at EXACTLY the same time - one would WIN and the other would fail to insert, and just do a SELECT instead.

42,266 rows in the LogName table

Stats last updated;

2017-02-15 14:38:51.287 Clustered index
2016-12-23 13:12:19.897 PKey

Results from dm_db_index_physical_stats

Name            index_type_d alloc_unit index_depth index_level frag%  FragCnt avgFragSize PageCnt    avgUsed% RecordCnt  ghostRecCnt VGhostRecCnt MinRecSize  MaxRecSize  AvgRecSize 
--------------- ------------ ---------- ----------- ----------- ------ ------- ----------- ---------- -------- ---------- ----------- ------------ ----------- ----------- ---------- 
Clustered Index CLUSTERED    IN_ROW_DAT 3           0           12.37  215     4.33        930        92.60    42268      0           0            58          314         162.96
Clustered Index CLUSTERED    IN_ROW_DAT 3           1           93.75  32      1.00        32         49.92    930        0           0            10          269         137.11
Clustered Index CLUSTERED    IN_ROW_DAT 3           2           0.00   1       1.00        1          67.03    32         0           0            10          269         167.59
PKey            NONCLUSTERED IN_ROW_DAT 3           0           2.73   97      7.18        696        98.77    42268      0           0            8           281         129.68
PKey            NONCLUSTERED IN_ROW_DAT 3           1           0.00   2       1.00        2          55.87    696        0           0            11          11          11.00
PKey            NONCLUSTERED IN_ROW_DAT 3           2           0.00   1       1.00        1          0.30     2          0           0            11          11          11.00

Checked the Web Logs, definitely no other pages accessed for several seconds before and after this error, so no obvious "race condition"

One possibly interesting thing is that the LogName being created is duplicated on the first 59 characters, there are 5,092 rows that have that same initial string. Of 1,332 rows added recently 243 had that same initial string. So likely that that particular index page is fragmented.

I don't see any issue off hand, other than willy-nilly issuing a COMMIT in the middle of a called proc -- that's a bad idea!

Yeah, agreed, that COMMIT is pretty crap .... probably all I knew back then!! Appreciate you taking a look, thanks.

Is there any chance that Read Committed Snapshot (which is ON for this DB) is contributing to this problem? (assuming there is some sort of race condition between two processes - notwithstanding that I can't seem to find that is the case)

Yes, definitely, because RCS gives you the data as it looked at the time the query started. RCS is extraordinarily helpful at times, but it does have lots of overhead and some quirks.

-- (1)
SELECT	ln_ID
FROM	dbo.LogName
WHERE	ln_Name = @LogName
SELECT	@intRowCount = @@ROWCOUNT

-- (2)

IF @intRowCount = 0
BEGIN
	-- Insert new record (with safeguard that it does not already exist)
-- (3)
	INSERT INTO dbo.LogName
	(
		ln_Name
	)
	SELECT	[ln_Name] = @LogName
	WHERE	NOT EXISTS
		(
			SELECT	*
			FROM	dbo.LogName
			WHERE	ln_Name = @LogName
		)
END

So if my @LogName does not exist at (1), and someone else adds it at (2), but my insert at (3) will STILL be using the snapshot from (1) ?? and thus my NOT EXISTS will be true (based on snapshot at (1) ) but then my INSERT will fail because someone else added the value at (2) so, actually, it does exist.

Is that what could actually happen?

Is there something I could code to force the WHERE NOT EXISTS to use current, rather than Snapshot, data?

You also have a unique clustered index on ln_Name that could be causing the problem - have you confirmed that the 4 characters that you are using for this column are unique and do not already exist? Based on your comments - I would not expect this user to even get to the insert because that user should already have a defined log.

It seems a bit weird to me to have a trigger updating the existing rows that were created in the insert by allowing those columns to use the default NULL value. To me it would be much cleaner to put those values in the insert itself - and only modify the ln_EditNo, ln_UpdateDt and ln_UpdateUser columns in the trigger (really shouldn't be updating the values for the insert - only for update).

Clarification: The trigger should only be set as an UPDATE trigger - default values should be set in the INSERT itself and then those specific columns would be updated in the trigger any time the table is updated.

You could try "WITH (NOLOCK)" on the NOT EXISTS query.

Well ... that';s what I was hoping the SProc did:

Get the ID
Not found?
(3) YES = INSERT but only if (still!) NOT EXISTS
No rows inserted?
YES - Must have been added by someone else then! Get the ID

I presume at line (3) is where the Dup Insert is occurring, even thought it has a NOT EXISTS test.

I agree. I included the Trigger code because it is there, and might have a bearing. Now I've seen it I'll definitely do something about it, but its like looking in the attic when you move house - amazing what is there that you didn't know about! (I just checked - the Trigger was created 12-Oct-2004 !!)

So I think that the trigger is not relevant, but included in the O/P just in case it turned out to be important.

I reckon this has been running for over 10 years like this, we delete millions of rows from the log table, in each clients' database, every week ... and now this springs up!

What I reckon is happening is:

User launches a Web page. That triggers a "Page Initialisation" Sproc, which (amongst other things) logs who the page referrer was.

As it happens the previous page had a unique URL. So the logging of the Referrer is going to have to insert a new row to the LogName table.

The "Page" that is being logged is actually some JavaScript in a database CMS record. (Ordinarily that would have been "hardened" into an external include-file so that it was cached, but this system is still in beta).

There must be a Javascript Error on that CMS Page (I can't find it, my browser is not detecting anything, my thought is that the multiple javascript CMS pages are loading in parallel and sometimes they are arriving on the user's machine in a different order which I am not reproducing on my PC).

The Javascript error triggers a Popup Window with the details, and THAT calls the same "Page Initialisation" Sproc AND ... <DrumRoll!> ... that WILL have the same Referrer name to be logged, so entirely possible that both the actual page and the Javascript Error Popup page are running in the same timeframe and therefore attempt to insert their new records simultaneously (perhaps aided and abetted by Stats being out of date or the index being fragmented such that the insert is not "instant")

The only problem is that I can NOT see two "Page Initialisation" Sprocs executions in SQL Profiler and I only have one entry for the page in IIS Web Log file :frowning: so although from all my debugging this looks like the most plausible explanation I'm short of Evidence :frowning:

Thanks, definitely going to give that a try. Troublesome though as I have never ever used NOLOCK in production code before!!

Perhaps I should start with an INSERT ... SELECT ... WHERE NOT EXISTS, rather than the SELECT.

Then my Read Committed Snapshot would start with that.

Trouble is, this was written starting with a SELECT because it is assumed that 99% of the time that the LogName record WILL already exist - so forcing an INSERT, which will fail most of the time, will be less efficient. This Sproc is called very very often.

P.S. sorry, meant to mention earlier:

I can't think of a reason why a row in this table would ever be updated (It COULD be :slight_smile: but there is no workflow that would cause that to happen), so the trigger ought to be completely unnecessary.

The INSERT is not attempting to provide values for Create/Update User & Date etc. I haven't checked if there are DEFAULTs for those columns, but even if there are for Create/Update Date we don't (as a norm) create any for Create/Update User ... so the Trigger would be the only place (in any of our code) that is setting User, unless the INSERT itself does it of course.

I'm normally a bit anti having the INSERT do it, because sooner or later inserts (for an Average Table!) will exist in different places, so normally we rely on the TRIGGER setting User for consistency - even though it means that every INSERT (and UPDATE) is then immediately re-updated by the trigger. We've never noticed a performance problem because of that, but it's definitely not ideal. It is crucial to us that EditNo is "changed" on each save, as it is used for optimistic record locking, hence using a Trigger to ensure that that happens. Again, all moot in this case because I don't believe that rows will ever be updated in the LogName table.

I made an experiment:

SELECT	ln_ID
FROM	dbo.LogName
WHERE	ln_Name = @LogName
SELECT	@intRowCount = @@ROWCOUNT

IF @intRowCount = 0
BEGIN
	-- Insert new record (with safeguard that it does not already exist)

	PRINT 'INSERT (' + @LogName + ') using a separate process'	-- <<<
	RAISERROR (N'Waitfor start ...', 10, 1) WITH NOWAIT			-- <<<
	WAITFOR DELAY '00:00:10'									-- <<<

	INSERT INTO dbo.LogName
	(
		ln_Name
	)
	SELECT	[ln_Name] = @LogName
	WHERE	NOT EXISTS
		(
			SELECT	*
			FROM	dbo.LogName --  WITH (NOLOCK)				-- <<<
			WHERE	ln_Name = @LogName
		)
END

During the WAITFOR interval I inserted the same @LogName value using a separate process, and did indeed get the "Duplicate Insert Error". So I guess the Read Committed Snapshot must be causing the problem.

I tried again, this time with the NOLOCK included, and it worked just fine.

I forsee two issues:

  1. the "other" process rolles back for some reason leaving me with a non-existent [ln_ID] result - seems like long odds, so I can live with that

  2. Index Page Splitting, during the NOT EXISTS, causing the index key to be (incorrectly) NOT found? I suppose that could happen, and it would result in a Dup Error too ...

Perhaps there is a way to disable Read Committed Snapshot just for this SProc ?

To change the isolation level for just that proc, issue a:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
before any data activity in the proc.

Scenario issues:

  1. rollback causes false match = definitely possible
  2. page split results in NOT FOUND = I don't see how this could happen, the keyed lookup should still work

I would always serialize this type of insert.
From memory this should also work with RCSI.

-- Insert new record (with safeguard that it does not already exist)
INSERT INTO dbo.LogName
(
	ln_Name
)
SELECT	[ln_Name] = @LogName
WHERE	NOT EXISTS
	(
		SELECT	*
		FROM	dbo.LogName WITH (UPDLOCK, SERIALIZABLE)
		WHERE	ln_Name = @LogName
	);

My thought was:

Get the ID
Not found?
(3) YES = INSERT but only if (still!) NOT EXISTS (WITH NOLOCK)
Insert fails with DUP

My thought is that at (3) another user inserts a row, that happens to cause an index page split. My NOT EXISTS (WITH NOLOCK) does NOT see the new index entry, because it is in the second-half of the index page split which is in the process of being written, or was written earlier in the chain, but that index page extension is then written before my INSERT makes its own Write ... at which time mine is a DUP.

Either way, not really keen on NOLOCK under any circumstances (apart from deliberate Read reports by SysAdmin were we want no interference with front end processes and can tolerate Dirty Reads and missing data)

There is THIS article by Aaron Bertrand:

Index corruption – surely you are not using NOLOCK in INSERT/UPDATE/DELETE statements, but if you are, you should be aware that this syntax is deprecated and that it can cause corruption, even in SQL Server 2014 RTM – see this tip for more information. Note that you should check for the hint in any views that you are trying to update, too.

so all the less keen to use NOLOCK in Insert/Updates!

That works, thanks.

Thanks. That works too.

If a coding team are in the habit of coding that how would you ever test that it had been included in all insert statements where it was needed? The "collision" is as rare as hen's teeth! and hard to deliberately force during testing. Perhaps Peer Review of code would be the only way?

I think I prefer Scott's code to use REPEATABLE READ because putting it at the top of the SProc safeguards whatever code change I might make in the future, but ...

... am I deluding myself and ought I to be providing the appropriate hints on the INSERT to force the behaviour I need?

I expect this is the only "high insert rate, potential collisions" INSERT that I have in the whole of the code base. But it brings into question whether, given that we use RCSI on all our DBs, whether we should be using a different Isolation on ALL our UpSert SProcs? (Our SProcs have a flag for "Create new", "Update existing" and "Don't know, update or create if not found". That later setting is very rarely used because we have an EditNo column, incremented on every update, which achieves optimistic record locking - so normal Updates would provide the expected-EditNo and thus be sure that the action required is Update. A process creating a new record would supply a "Create Row" parameter to the UpSert Sproc. So that only leaves the few "don't care" usages, and they definitely do a similar "Is it there? No=Insert" type action (and they have no Hints at present) which could also fall foul of this RCSI issue.

Presumably I need to consider what the SCOPE of this is. I should perhaps use a Transaction for the (UPDLOCK, SERIALIZABLE) section of the code so that it is discrete, and short? - this "Get Log Name Sproc" could be called by a monster SProc.

Is there a similar issue with SET TRANSACTION ISOLATION LEVEL REPEATABLE READ where I need to "reset" that before existing this Sproc, or is its scope limited to the current SProc only?

I guess I can test those scenarios :slight_smile: ...

Scope is limited to the SProc:

SET NOCOUNT ON
GO
CREATE PROCEDURE dbo.USP_IsoTest
AS
	SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

	SELECT	'In SProc', transaction_isolation_level 
	FROM	sys.dm_exec_sessions
	WHERE	session_id = @@SPID

	SELECT TOP 1 object_id FROM sys.objects
GO

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
GO
SELECT 'Before', transaction_isolation_level FROM sys.dm_exec_sessions WHERE session_id = @@SPID
EXEC dbo.USP_IsoTest
SELECT 'After', transaction_isolation_level FROM sys.dm_exec_sessions WHERE session_id = @@SPID
GO
DROP PROCEDURE dbo.USP_IsoTest
GO
SET NOCOUNT OFF
GO
       transaction_isolation_level 
------ --------------------------- 
Before 2

         transaction_isolation_level 
-------- --------------------------- 
In SProc 4

object_id   
----------- 
3

      transaction_isolation_level 
----- --------------------------- 
After 2

This article suggests that using

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

at the top of the Sproc, whilst fixing the problem, will allow deadlocks and that @Ifor solution of:

FROM	dbo.LogName WITH (UPDLOCK, SERIALIZABLE)

will avoid that.

I also came across a suggestion to set the Unique Index with IGNORE_DUP_KEY - that would let the duplicate inserts (silently - well "with a warning message only") fail. Have to be careful using SCOPE_IDENTITY() though as that would have the most recent, successful, insert in the current scope. Re-selecting the ID, after INSERT, might be a cure for that