SQLTeam.com | Weblogs | Forums

Merge Fun

Im looking for confirmation I think mostly. Im trying to prototype a Merge of data from a Temp staging table to our final table. This data could change a couple of times a day. We do this today in varying ways, 1 we delete the dataset on target completely and re-insert. This is quick, but terrible for replication performance. 2. We use the binary_checksum feature to check for updated records and Update, Insert or delete as necessary. This is great for replication, but terrible performance and Binary_checksum is greatly flawed.

I am vetting out a change to MERGE, which appears to allow me to check each and every column instead of a checksum value. My preliminary tests are very fast as well. However, I have not solidified my ON criteria correctly I think because I get more records with the second import than I should have (which I think means Im getting more inserts than I should have). I know this example is missing the table, etc... Im mostly looking if someone see's what I am missing in general.

MERGE FinalTender With (HOLDLOCK) AS TARGET
Using Select (all columns) from TempTender Group by (all columns)

------ HERE on the ON is where I think I get confused I have tried listing all columns or only a select number, but Im still not getting the right results. Close, but not right.

ON (Target.DOB = Source.DOB
Target.EmpID = Source.EmpID
Target.CheckID = Source.CheckID
Target.StoreID = Source.StoreID
Target.TenderType = Source.TenderType

WHEN MATCHED
----- HERE on the matches, I could update all remaining fields, but the fact is, a tendered check could have more than one tender amount, maybe even two Visa payments of different or same amounts payment type, etc, so that ON list above is not enough. The data can be so varied
------- If I list all columns instead of just a few, I think I would need to Delete from Source then because there is nothing to update as all data matches. Sounds logical to me except I dont understand when an UPDATE occurs.

WHEN NOT MATCHED ON TARGET:
------- This must be a new tender record from source, so I want to Insert into Target. No confusion here

The ON clause should contain your primary key (or whatever deems a row unique). If you are getting more records, then you are probably missing a column or 2. One thing to be careful of with merge, is it will fail if it tries to update more than 1 row.

Just a clarification - it isn't a problem for MERGE to update multiple rows, the problem is when MERGE attempts to update the same row more than once. This can be caused when the source data contains duplicates - or when the match clauses identify the same row with different criteria.

In this case - the ON clause needs to uniquely identify the row in the target. The MATCH portion can then do something like:

WHEN MATCHED
AND (
tgt.othercol1 <> src.othercol1
OR tgt.othercol2 <> src.othercol2
)
THEN UPDATE

WHEN NOT MATCHED
THEN
INSERT

WHEN NOT MATCH BY SOURCE
AND ( additional criteria if needed)
DELETE