This is part of an ETL Data Import
Step 1 [does a TRUNCATE and then] pulls a BCP file into an IMPORT table.
Step 2 does a row-by-row comparison of IMPORT table and STAGING table to find any rows that have a change in any column and then deleted the rows. Then an INSERT for any "non existent" rows (which also get current date/time in a LogTimestamp column). (Subsequent ETL SProc then processes "new since last time" rows
The SProc for Step 2 sometimes (maybe 30% of the time) takes a long time. That Sproc is one of many that do the same type of operation. There are about 4 or 5 SProcs that basically have very similar sized IMPORT and STAGING tables and also the number of "changed rows" is very similar. All were mechanically generated so should, to all intents and purposes, be identical. Its only the one SProc that is ever slow.
Normal running of this SProc:
About 2,000,000 rows in the IMPORT table
Typically deletes 1000-2000 rows
Typically inserts 1000-5000 rows
On a good day 200 seconds, on a bad day 100,000 !! seconds. There is no "middle ground", either it takes a couple of minutes or it takes more than a day.
I changed the SProc to REORGANISE the indexes on IMPORT before it starts, and to REORGANISE on STAGING after it finishes (ready for next time, and so that downstream ETL have a better time )
Ran in the short time for three days and the last two nights has gone back to 100,000 seconds
Stupidly I forgot the add an Update Statistics after the INDEX REORG , so I'll add that - but can the IMPORT table get so badly out of shape from a TRUNCATE TABLE and then BCP in replacement data (and then REORG the indexes)?
None of the other, similar, processes that handle similar amounts of data has ever run slow, and I can't see that there is anything special about this one.
The process is:
INSERT all rows from IMPORT into ##TEMP
DELETE from ##TEMP any row that is same (all column comparison) as STAGING based on a PKEY JOIN
INSERT INTO STAGING from IMPORT where NOT EXISTS in STAGING
Apart from also freshening up the Stats I could also add a RECOMPILE to the SProc ... although the whole thing is just PKey (clustered) based
Only other thought I have had is to put some "elapsed" time logging in so I can then figure out which sub-query is the slow part.
But is there anything obvious that would explain the change from 200 seconds to 100,000 seconds runtime?
I wonder if something else is going on on the server at that time, its around 02:00, but it seems improbably that THIS Sproc is ALWAYS the slow one - a recurring server-load would be as likely to hit a different Sproc if there was a bit more data one night, and thus each Sproc ran a few seconds later / earlier than normal.