Merge Fun

JAdauto · June 3, 2019, 12:11am

Im looking for confirmation I think mostly. Im trying to prototype a Merge of data from a Temp staging table to our final table. This data could change a couple of times a day. We do this today in varying ways, 1 we delete the dataset on target completely and re-insert. This is quick, but terrible for replication performance. 2. We use the binary_checksum feature to check for updated records and Update, Insert or delete as necessary. This is great for replication, but terrible performance and Binary_checksum is greatly flawed.

I am vetting out a change to MERGE, which appears to allow me to check each and every column instead of a checksum value. My preliminary tests are very fast as well. However, I have not solidified my ON criteria correctly I think because I get more records with the second import than I should have (which I think means Im getting more inserts than I should have). I know this example is missing the table, etc... Im mostly looking if someone see's what I am missing in general.

MERGE FinalTender With (HOLDLOCK) AS TARGET
Using Select (all columns) from TempTender Group by (all columns)

------ HERE on the ON is where I think I get confused I have tried listing all columns or only a select number, but Im still not getting the right results. Close, but not right.

ON (Target.DOB = Source.DOB
Target.EmpID = Source.EmpID
Target.CheckID = Source.CheckID
Target.StoreID = Source.StoreID
Target.TenderType = Source.TenderType

WHEN MATCHED
----- HERE on the matches, I could update all remaining fields, but the fact is, a tendered check could have more than one tender amount, maybe even two Visa payments of different or same amounts payment type, etc, so that ON list above is not enough. The data can be so varied
------- If I list all columns instead of just a few, I think I would need to Delete from Source then because there is nothing to update as all data matches. Sounds logical to me except I dont understand when an UPDATE occurs.

WHEN NOT MATCHED ON TARGET:
------- This must be a new tender record from source, so I want to Insert into Target. No confusion here

mike01 · June 5, 2019, 8:06pm

The ON clause should contain your primary key (or whatever deems a row unique). If you are getting more records, then you are probably missing a column or 2. One thing to be careful of with merge, is it will fail if it tries to update more than 1 row.

jeffw8713 · June 5, 2019, 8:28pm

Just a clarification - it isn't a problem for MERGE to update multiple rows, the problem is when MERGE attempts to update the same row more than once. This can be caused when the source data contains duplicates - or when the match clauses identify the same row with different criteria.

In this case - the ON clause needs to uniquely identify the row in the target. The MATCH portion can then do something like:

WHEN MATCHED
AND (
tgt.othercol1 <> src.othercol1
OR tgt.othercol2 <> src.othercol2
)
THEN UPDATE

WHEN NOT MATCHED
THEN
INSERT

WHEN NOT MATCH BY SOURCE
AND ( additional criteria if needed)
DELETE

JAdauto · October 7, 2019, 3:23pm

I wanted to follow up on this and I didn't know if I should start a new thread or continue this one. None the less, I have the Merge working great, until we get to load testing when we experience a ton of Deadlocks. I have tried adding sp_getAppLock, adding it to a transaction, changing the Lock to HoldLock, RowLock, Tablock, changing Isolation Level. Im not sure if Im not getting the right combination or what. Im reading others have had these issues, but I cannot identify what the solution is to resolve. I feel like I am very close to the solution.. but sometimes this process takes about .2 seconds and sometimes it holds the lock for 8 min, which is totally unacceptable. Any ideas are appreciated. Thanks!
JAdauto