Hi, I’m working on an ETL process to pull data from a production database. Some of the source tables I’m using to populate dimension tables in a data warehouse don’t have primary keys (existing purchased product). I’d like to create a merge statement to insert/update/delete based on changes, but I’m a bit perplexed on how to approach this. I have two basic ideas, in idea 1), I’d just set it up as a normal merge, but the ON clause would essentially have just about every column in it, which would guarantee that any changes would be inserts, not updates. In idea 2) I would create a middle table to archive all the source data and add a primary key. Here are a couple table variables and process to illustrate the issue:
--Setup the sample source data
DECLARE @FeedbackCategorySource TABLE (FeedbackCategory VARCHAR(30), ReportGroup VARCHAR(30), FeedbackSubCategory VARCHAR(30))
INSERT INTO @FeedbackCategorySource (FeedbackCategory, ReportGroup, FeedbackSubCategory)
VALUES
('Complaint', 'Civil Rights', 'ADA')
,('Complaint', 'Civil Rights', 'Title VI')
,('Complaint', 'Civil Rights', 'General Discrimination')
,('Complaint', 'Daily Operations', 'Employee')
,('Complaint', 'Daily Operations', 'Vehicle')
,('Complaint', 'Daily Operations', 'Facility')
,('Complaint', 'Daily Operations', 'Bus Schedule')
--mock DIM table
DECLARE @dim_FeedbackCategory TABLE (dim_FeedbackCategoryKey INT PRIMARY KEY IDENTITY NOT NULL, FeedbackCategory VARCHAR(30), ReportGroup VARCHAR(30), FeedbackSubCategory VARCHAR(30),
update_time DATETIME)
--MERGE with sort of useless update?? Delete and insert seem to be the only value
MERGE @dim_FeedbackCategory TARGET
USING @FeedbackCategorySource SOURCE
ON target.FeedbackCategory=source.FeedbackCategory AND target.ReportGroup=source.ReportGroup AND target.FeedbackSubCategory=source.FeedbackSubCategory
WHEN MATCHED --This seems useless since we've already joined on all the columns in the SET clause
THEN UPDATE
SET target.FeedbackCategory=source.FeedbackCategory, target.ReportGroup=source.ReportGroup, target.FeedbackSubCategory=source.FeedbackSubCategory, target.update_time=GETDATE()
WHEN NOT MATCHED BY TARGET
THEN INSERT (FeedbackCategory, ReportGroup, FeedbackSubCategory, update_time)
VALUES (source.FeedbackCategory, source.ReportGroup, source.FeedbackSubCategory, GETDATE())
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
Anyone have any thoughts on which method to go with or any tips on how to make method 1 work?