Is this insert statement thread-safe?

Hi, I'm wondering if this insert statement is safe in terms of multiple clients running the same query at the same time. The query inserts a random "PIN" number which, along with the ConsultantID, needs to be unique (there is also a unique index which covers both columns).

  IF NOT EXISTS  
    ( SELECT * FROM Appointments 
      WHERE ConsultantID = @ConsultantID AND ApptPIN = @RandomApptPIN )
    BEGIN
      INSERT INTO Appointments (ConsultantID, ApptPIN, ... etc)
      OUTPUT INSERTED.AppointmentID
      VALUES (@ConsultantID, @RandomApptPIN, ... etc)
    END
  ELSE SELECT 0 AS AppointmentID;

The query checks for the existence of a matching pair first, before doing the insert if a match isn't found. It will return zero (0) if the ConsultantID + PIN already exists. If it returns 0, then I generate a new random PIN and run it again, until it returns the successfully inserted AppointmentID.

My question is about thread safety. If several clients run this query simultaneously, could one thread sneak in the same values as another thread, in between the IF NOT EXISTS clause and the INSERT clause? Or does the query "block" other threads until both the IF NOT EXISTS and the INSERT are both evaluated?

ed: fixed typo in query

I would do the whole thing in one DML

;with cteAppointments
as
(
select @AppointmentID, @RandomApptPIN, ... etc
)
INSERT INTO Appointments (ConsultantID, ApptPIN, ... etc)
      OUTPUT INSERTED.AppointmentID
select @AppointmentID, @RandomApptPIN, ... etc
from cteAppointments src
where not exists (select 1
                    from Appointments tgt 
                    where src.AppointmentID = tgt.AppointmentID
and src.RandomApptPIN = tgtRandomApptPIN)

Thanks @yosiasz, wasn't aware of that method. However is my example ok to use? Or are you saying it's not thread-safe?

ed: Just looked up DML and it sounds like it applies to SQL 2012-2016. If one was still using SQL 2008, or even SQL Express, what method would be best if it didn't support DML?

Also, can't the above be simplified into this? Is there a reason to use a CTE?

INSERT INTO Appointments (ConsultantID, ApptPIN, ... etc)
    OUTPUT INSERTED.AppointmentID
SELECT @ConsultantID, @RandomApptPIN, ... etc
WHERE NOT EXISTS (SELECT 1
                  FROM Appointments 
                  WHERE ConsultantID = @ConsultantID AND ApptPIN = @RandomApptPIN)

(np. just noticed I put "AppointmentID" where "ConsultantID" should have been. Edited original post to fix)

I would say No.

The NOT EXISTS may return TRUE, another user inserts a row with that PKey etc. and then this INSERT would fail.

That exact-timing collision is very hard to reproduce in testing of course ... (but you can force it, to prove the case, but using a WAITFOR delay between the NO EXISTS test and the INSERT and then, in another session, create the collision-row)

We tend to do this:

INSERT INTO Appointments
(
    ConsultantID, ApptPIN, 
    ... etc
)
SELECT @ConsultantID, @RandomApptPIN, 
    ... etc
WHERE NOT EXISTS
(
    SELECT *
    FROM Appointments 
    WHERE ConsultantID = @ConsultantID AND ApptPIN = @RandomApptPIN
)
SELECT @intRowCount = @@ROWCOUNT, @AppointmentID = scope_identity()

SELECT CASE WHEN @intRowCount = 1 THEN @AppointmentID ELSE 0 END AS AppointmentID

but using OUTPUT is probably the more sensible way for modern code ... we don't have much of that!!

Thanks @Kristen! So basically the WHERE NOT EXISTS needs to be at the end of the query, which then actions the INSERT immediately upon not finding anything.

I assume you're saying there's no chance that another thread will perform the WHERE NOT EXISTS simultaneously? So no chance both threads will come up with nothing at the same time, then both try to perform the insert?

Or is there still a chance of that, but it's a lot less likely?

I would rephrase that as:

"The NOT EXISTS needs to be part of the SAME statement as the INSERT so that it is atomic".

If you have to do two statements then you'll need to use something like WITH (UPDLOCK, SERIALIZABLE)

With two statements, as per your O/P, then that will DEFINITELY occur. When both tests for NOT EXISTS (run at very-close-to-simultaneous) return true there is then a "Race" for the INSERT, and only the first one will win, the other(s) will terminate with error.

With a single statement it will not happen because SQL will perform the action as a single task - i.e. atomically.

Thanks again, @Kristen, much appreciated. I did some research and discovered one can do a MERGE without updating, which I wasn't aware of. I only need to insert a new row and return the new Identity, or return 0 if it couldn't insert the unique values.

I came up with the following, which does seem to accomplish that, but I'm wondering if there is a simpler way, as this seems very verbose:

DECLARE @id INT;
DECLARE @MergeOutput TABLE (
  ACTION VARCHAR(10) DEFAULT '',
  ID INT DEFAULT 0
);
MERGE Appointments AS appt
USING ( 
  SELECT @ConsultantID, @RandomApptPIN, ... etc
  ) AS src ( ConsultantID, RandomApptPIN, ... etc )
ON appt.ConsultantID = src.ConsultantID AND appt.ApptPIN = src.RandomApptPIN
WHEN MATCHED
  THEN UPDATE
    SET @id = 0 -- doesn't do anything, but if WHEN MATCHED is omitted, OUTPUT does not return a resultset.
WHEN NOT MATCHED
  THEN INSERT ( ConsultantID, ApptPIN, ... etc )
      VALUES ( src.ConsultantID, src.RandomApptPIN, ... etc )
OUTPUT $ACTION, INSERTED.AppointmentId INTO @MergeOutput;
-- Note that @MergeOutput.ID will also contain the MATCHED row ID, so we need
-- to check @MergeOutput.ACTION and return zero (0) if it's an "UPDATE".
SELECT CASE (
    SELECT ACTION
    FROM @MergeOutput
  )
    WHEN 'UPDATE' THEN 0 -- Return 0 if no insert occurred.
    ELSE (
        SELECT ID -- Otherwise return the new row ID
        FROM @MergeOutput
      )
  END;

This seems to work fine, but am I right in thinking it could be written better?

MERGE usually involves too much code for my liking ... I find it hard to see what-is-what-and-where (which might just be my familiarity with the older ways) and we put a lot of effort into coding-style as a means of defensive-programming so that developer-induced bugs are more likely to be detected. I particularly dislike where Column Lists are not immediately adjacent - so a CURSOR where the definition (column list) is nowhere near the Load column-list. Might be your code could be tightened, but you've got Column/@Parameter lists in there 4x, plus some odds-and-sods of AppointmentId

'Fraid MERGE doesn't do it for me

Thanks @Kristen, I agree, there is a lot of repetition of column names there. With a long list of columns it would be awful - although in that scenario I'd probably use MERGE just to determine whether INSERT or UPDATE is necessary, assign just the unique keys, then follow it with an UPDATE for the rest of the columns.

What attracted me to it was the "atomicity" you mentioned, which it does seem to have, making conditional inserts easier to normalise. That is, if every "upsert" and "conditional insert" uses MERGE, then one doesn't have to worry about remembering to use the same transaction & locking structure each time, which is where things can go wrong if I'm not mistaken.

So I might stick to MERGE for these tricky, race-prone situations, just in case. A bit of clunky-looking code seems worth the safety and standardisation it brings.

FYI, I did find an (oldish) article which shows that MERGE, even though atomic as a DML statement, is still prone to race conditions: http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx

yes it can be simplified as such. I like using cte because I can do a lot of cleanup in that block of cte instead of doing transformation dml right in the select statement of the insert. comes in handy when you have to do rtrim ltrim, cast etc

I was going to mention that .... whatever you do it needs careful thought

I think it worth considering how often you will get a collision. Usually such things are incredibly rare (and if not, like Pop Concert Seat tickets perhaps ...) then it needs pretty special handling.

If it is "rare enough" then it might be enough to do:

INSERT INTO MyTable(Col1, ...
SELECT @Col1, ...
WHERE NOT EXISTS ...

IF @@ROWCOUNT = 0 -- There was NO Insert, Update existing record
BEGIN
    UPDATE MyTable
    SET Col1 = @Col1, ...
    WHERE ...
   IF @@ROWCOUNT = 0 *** THE ROW WAS DELETED !! ***
END

That works most efficiently if, usually, the row does not exist. i.e. it favours attempting INSERT first. I think it also works best if a race-condition is somewhat likely, and provided that deletions are either rare or never happen.

If the row is more likely to already exist then do the UPDATE first, and if that fails then do an INSERT instead.

UPDATE MyTable
SET Col1 = @Col1, ...
WHERE ...

IF @@ROWCOUNT = 0 -- There was no existing record to Update, do Insert instead
BEGIN
    INSERT INTO MyTable(Col1, ...
    SELECT @Col1, ...
    WHERE NOT EXISTS ...
    IF @@ROWCOUNT = 0 *** SOMEONE ELSE INSERTED THE ROW !! ***
END

But you may want to use MERGE as a replacement for those UpSert coding methods.

I have to ask - in this whole process it appears that the only thing you are concerned about is generating some 'random' value to assign for the appointment PIN. First - does it have to be random or just unique? If just unique - why not use IDENTITY which is thread-safe and much easier to code?

If you truly want something random and unique - you could use NEWID() to generate a GUID that would be both random and unique for every insert. There are conditions that may cause duplicates - but it really is a very small chance.

Now...if you want a GUID but want something that is sequential you can use NEWSEQUENTIALID.

Any of these would be much better than building something that you generate on your own - especially since generating something on your own requires you to test for a value that already exists and resubmit the request until you find a value not currently utilized.

Since you appear to be on 2008 still - you cannot take advantage of SEQUENCE which would work just as well for your situation. If you can upgrade to 2012 or higher then I would recommend looking into using a SEQUENCE for this requirement. It would satisfy all of your requirements - but would not be random.