Not sure I am following what you are trying to accomplish - and whether or not your 'staging' table is actually being utilized the way I think.
If the goal is to extract data from your source system - and only insert/update changed data into your destination - then there are a couple of options:
- Extract from source - load to staging table on destination - MERGE to target
- Extract from source - load to staging table on destination - UPSERT to target
- Extract from source - use SCD transformation in SSIS to INSERT/UPDATE to target
For Items 1 and 2 - truncate the staging table - use a simple data flow from the source system to the destination system into a staging table. Use a SQL Execution task to execute a stored procedure on the destination system that performs either a MERGE or UPSERT (INSERT/UPDATE) to the final target table.
For Item 3 - use the SCD transformation, define the source table/query - the destination table - the defined business keys and SSIS will build the data flow as insert/update operations based on the existence of those business keys in the destination.
For any of these - you need to evaluate whether or not you want to update the data that already exists - and what determines that the data should be updated. For example - for MERGE you would add criteria on the WHEN MATCHED:
WHEN MATCHED
AND (
target.column1 <> source.column1
OR target.column2 <> source.column2
)
THEN
UPDATE ...
For an UPSERT:
BEGIN TRANSACTION;
UPDATE t
SET ...
FROM target t
JOIN source s ON s.key = t.key
WHERE t.column1 <> s.column1
OR t.column2 <> t.column2
INSERT INTO target
SELECT ...
FROM source s
WHERE NOT EXISTS (SELECT * FROM target t WHERE t.key = s.key)
COMMIT TRANSACTION;
For the UPSERT method - I would recommend putting the whole thing inside a try/catch block with error checking to about the transaction if either the update or insert fails.
Extracting a million rows from the source shouldn't be an issue - unless there is a performance problem pulling that much data across the network. If that is the case, it doesn't matter what method - you are going to have the same performance issue.