Question about database to support ad-hoc reporting

craigdavidsmith · July 21, 2016, 4:49pm

Hi all,

We currently have an OLTP database that supports our application which has some canned reporting in it. We are building an ad-hoc reporting tool (by integrating with Dundas BI) to give our clients more reporting flexibility. I realize that the best solution here is to create an ETL to push data into an OLAP database and we are exploring that (using Redshift). But that is more of a medium-term solution for us and we would like to get a proof of concept out sooner.

So my questions are...

(1) Would it be better to use an ETL to denormalize some of the schema into flatter tables that resemble cubes or to leave the OLTP structure as is. By flattening out the structure we are saving in terms of joins but we are also reducing the flexibility of SQL Server's query optimizer (I think).

(2) Should we create very narrow indexes on every column in the tables to address the ad-hoc nature of the tool. This would prevent table scans but would likely result in many key lookups. I presume key lookups are better than table scans in most cases???

(3) If we do use an ETL to flatten out the table structure what is the best way to only move over data that changed since the last data push?

Of course, any other advice on how best to proceed both in the short term and medium term would be appreciated.

gbritton · July 21, 2016, 6:14pm

Generally, I'd advise a standing up a new DW server and ETL'ing the data from the OLTP server into a standard star or snowflake schema. From there, you can build an OLAP cube for lightning-fast query results. Also, by building a cube, you can stop worrying about the query optimizer.

Not that I'd worry about the optimizer anyway. Later versions of SQL Server are good at recognizing star-schema queries and optimizing for that type specifically.

Narrow indices are indeed advised in your DW. Ints or Bigints are good choices. Your ETL can take care of looking up the PKs from the business keys in your Dim tables.
two steps at least:
a. use slowly changing dimensions for your Dim tables (SSIS has an SCD component just for that, or you can use a MERGE statement
b. For Fact tables, keep track (perhaps in a special table or database) of your last run's max datetime (or whatever) and start each ETL looking for rows that are greater than that value.

Note: this is a general approach for large DWs. How big is your data? If not that big, this might be a lot of work for not so much benefit.

OTOH it can be a big plus to know when the reporting data is loaded. Better than the ever-changing OLTP data. e.g. if loaded overnight, the reporting team knows the data is as of close of business on the previous day (or some other cutoff point)

craigdavidsmith · July 21, 2016, 6:43pm

Awesome info. Thanks! The database is approaching 1TB in size. Sounds like I have some SSIS learning to do

JeffModen · July 23, 2016, 4:13am

Perhaps something the likes of "Power Pivot" would be more suitable.