Hi there,
I've got a situation I want to try to model and wanted to see what other folks have done in this situation. I've run into this before but the data set I'm working with has a lot of user error issues that I'm going to have to deal with.
My understanding with Kimball method dimensional modeling is that you don't want to update fact records, which I have been able to avoid thus far. And what I'm working with has to be atomic grain, so periodic snapshots won't really make sense.
Basically, in the source system, users may need to correct a date field or other fields that are not additive. I'll be getting updates not based on order date but on insert or update date of the source system.
I'd like to know what other people are doing when they run into a similar issue. So for example, let's say (modeled below) a user originally put the order date (there are about 7 date fields they work with, but simplified to 1 here), in as 1/1/2020. The data gets loaded into the fact table that night, and all is good in the world.
The next day, a manager makes the employee correct the order date to 12/31/2019 because the employee failed to do the data entry. The measurements may or may not have changed. I'm focused on the date for this example.
Would you logically delete? Delete? Add a new record to reverse it? Or something else?
I've writen an extremey oversimplified example below. Forgive me for not normalizing the source data. Hopefully folks can see what I'm trying to point out.
Thanks in advance!
--load a date dimension with the 2 dates that will be in question
declare @dim_date table (dim_date_key int, cal_date datetime)
insert into @dim_date (dim_date_key, cal_date)
values
(20191231, '2019-12-31 00:00'),
(20200101, '2020-01-01 00:00')
--assume the fact staging table was loaded with source data
declare @stagetable table (order_id int, order_date datetime, amount float(2), insert_datetime datetime, update_datetime datetime)
insert into @stagetable (order_id, order_date, amount, insert_datetime, update_datetime)
values
(1, '2020-01-01', 15.35, '2020-01-01 15:30', null)
--check values of staging table
select *
from @stagetable
--set up simple fact table
declare @facttable table (dim_date_key int, amount float(2))
--insert fact records from staging that don't exist in the fact table
insert into @facttable (dim_date_key, amount)
select d.dim_date_key, s.amount
from @stagetable s
join @dim_date d on d.cal_date=s.order_date
except
select dim_date_key, amount
from @facttable
--check values of fact table
select *
from @facttable
--user fixed a date field in the source system. Consequently, there was a new version of the fact record
insert into @stagetable (order_id, order_date, amount, insert_datetime, update_datetime)
values
(1, '2019-12-31', 15.35, '2020-01-01 15:30', '2020-01-02 16:00')
--insert new records that are not already in the fact table
insert into @facttable (dim_date_key, amount)
select d.dim_date_key, s.amount
from @stagetable s
join @dim_date d on d.cal_date=s.order_date
except
select dim_date_key, amount
from @facttable
--what would you do with the original record? It's not a measure issue and not a late arriving dimension issue. The dimension was wrong.
--Do you logically delete the original? Do you throw it out then reload? Do you update it?
select *
from @facttable