Hi there, I'm wondering if dimensional modelers have a recommendation on how to approach a particular dimension that I want to conform. The main issue is that the dimension (let's call it dim_BusRoute) has attributes that may or may not be applicable from one business transaction to another. In my mind, if that's not handled, it would lead to false duplicates.
So for example, in the following sample table variables (PK's are implied for brevity), the Routes table holds the main route information about the bus route. The pattern table gives the version of the route (northbound, southbound, completes a full trip, cuts short, etc). Typically in transit, route and pattern are very closely related. There are 3 use cases for how the routes are used in the following queries:
- the business transaction always has both the route and the pattern (version of) the route
- the business transaction always only has the route and no pattern
- the business transaction sometimes has the route and sometimes has the pattern, and sometimes has neither.
GIven these use cases, can anyone recommend how I would create a conformed dimension that could drill across the various facts or business transactions without creating duplication?
DECLARE @Routes TABLE (RouteId INT, RouteName VARCHAR(10)) DECLARE @Patterns TABLE (PatternId INT, RouteId INT, PatternCode VARCHAR(10), Direction CHAR(1)) DECLARE @Ridership TABLE (RouteId INT, ServiceDate DATE, NumTrips INT) DECLARE @Trips TABLE (TripId INT, RouteId INT, PatternId INT, StartTime TIME, EndTime TIME) DECLARE @Feedback TABLE (FeedbackId INT, ClientId INT, FeedbackType VARCHAR(10), RouteId INT, PatternId INT) INSERT INTO @Routes (RouteId, RouteName) VALUES (1, 'Max') ,(2, 'Flex') INSERT INTO @Patterns (PatternId, RouteId, PatternCode, Direction) VALUES (1, 1, 'Full', 'N') ,(2, 1, 'Full', 'S') ,(3, 1, 'Short', 'N') ,(4, 1, 'Relief','N') ,(5, 1, 'Relief', 'S') ,(6, 2, 'Full', 'N') ,(7, 2, 'Full', 'S') ,(8, 2, 'Relief', 'N') INSERT INTO @Ridership (RouteId, ServiceDate, NumTrips) VALUES (1, '2018-01-01', 40) ,(1, '2018-01-02', 48) ,(2, '2018-01-01', 50) ,(2, '2018-01-02', 30) INSERT INTO @Trips (TripId, RouteId, PatternId, StartTime, EndTIme) VALUES (100, 1, 1, '07:00', '07:30') ,(101, 1, 2, '07:30', '08:00') ,(102, 1, 3, '07:30', '07:45') ,(103, 1, 4, '07:45', '08:00') ,(104, 1, 5, '08:00', '08:15') ,(105, 2, 6, '09:45', '10:30') ,(106, 2, 7, '10:30', '11:45') ,(107, 2, 8, '11:45', '12:00') INSERT INTO @Feedback (FeedbackId, ClientId, FeedbackType, RouteId, PatternId) VALUES (1, 20, 'Complaint', 1, NULL) ,(2, 28, 'Complaint', 2, NULL) ,(3, 40, 'Complaint', 1, 4) ,(4, 43, 'Compliment', 2, 8) ,(5, 87, 'Complaint', 2, NULL) ,(6, 99, 'Complaint', NULL, NULL)
EDIT: I believe I figured out a solution. Though it's not in the source data already, create a dim_RoutePattern table, but create a variant record that has "N/A" for the pattern and direction information. I'd have to create a secondary data source outside the source applications to create these master versions of the route. This way, the dimension should still be conformed along business processes.