Surrogate keys v business keys ssas

dean · August 15, 2016, 9:16pm

hi there

Hi if I use primary surrogate keys in dimension tables and the dimension tables have a different amount of rows to each other, how can the surrogate keys map to the correct record? also when brought into the fact table surely they will be out of alignment say if one table has only unique records of 10 customers and sales table has a millions of rows...are the surrogate keys for row numbering only and we still use the business key for the joins?
thanks
sans

TaraKizer · August 15, 2016, 10:39pm

It sounds like you are referring to an identity column. The column is only an IDENTITY in the parent table. In the child table, it is NOT an identity column, so it doesn't matter how many rows are in either table. The child table may have its own identity column, but it's not the same column as the one mapped to the parent table. These are two different columns in the child table. The mapping between the tables is the same value: 1 maps to 1, 10 maps to 10.

dean · August 16, 2016, 6:56am

Hi Tara,

Are you saying each dimension can have its own identity column and it does NOT matter how many rows are in each one? and the fact table has no identity column but pulls in all the identity surrogates keys from each dimension? if this is so, and if the dimension tables have a different number of rows they would be out of sync on the fact table? for example one dimension only has a list of customers. Or does it not matter about the order of the identity columns in the fact table as linking the rows by the business key sorts the order of the records?

TaraKizer · August 16, 2016, 4:51pm

I can't specifically answer for dimension tables or SSAS. My answer is just in regards to how identity columns work in the parent and child tables. The order of the columns in a table doesn't matter.

Here's an example, I didn't take the time to add PK or FK constraints:

create table Parent (ParentID int identity(1, 1), name varchar(50));
insert into Parent (name) values ('Tara'), ('Dean'), ('Samuel');

create table Child (ChildID int identity(1, 1), column2 int, column4 varchar(10), column3 int, ParentID int);
insert into Child (column2, column4, column3, ParentID) values (24, 'test1', 5, 1), ( 999, 'test2', 0, 3), (8765, 'test3', 1999, 3), (77777, 'test4', 22, 3);

select * from Parent;
select * from Child;
select * from Parent join Child on Parent.ParentID = Child.ParentID;

drop table Parent, Child

ParentID is how these two tables are linked. Doesn't matter how many rows are in the Child table, ParentID still links up to a row in the Parent table via the ParentID column in both tables.

dean · August 20, 2016, 12:34pm

Thanks for taking the time to explain, I understand that.

Does this makes sense to you?

Is it better to load the fact table from the dimension tables, so populate the dimensions with all records from DW and load fact with same records from those dimensions using joins.

2.Or is it better to have the dimensions only contain the main attributes (not actual records) in a list format, so say 10 rows for customer dimension and 20 rows for products in product dimension, then load the fact table from the DW and not the dimension(s).

From what I can see both ways will work although point 1 would mean adding all the records to the dimensions first then select across to the fact table which would be overkill??

And point 2 would be in effect just lists of unique customers, products...with a business key in each to point back to the fact.