How to optimally set up database

WebKill · July 6, 2015, 8:44pm

I have a bunch of data sources that I need to get into a database, there are about 15 common fields that they all share, but then there are also about 15 fields that are different (one data source may have a couple extra fields, another two sources may have 4 extra fields and so on). Furthermore I want to set this up with the future in mind, as additional data sources are identified there will be more unique fields added in. The other consideration is the amount of data, there will be an estimated million rows added per week, with that number to increase over time as well.

The way I see in there are a few options:

One table with a lot of columns and add new columns to it as needed
Two tables, one with common data and one with extra field data linked together through a PK unique ID #
Four or Five tables based on data source

I don't know enough about optimization to really speak to this, I just thought up those different ways, and there may be more, I just don't know. I know how to tailor the indexing to the need, and setting up partitioning perhaps by month would be good, but I am looking for suggestions on optimal structure given the variable data fields and size of import.

Ultimately this will be used as a workflow and reporting tool, I will put together a little program to pull data, or display records that they can comment on.

ScottPletcher · July 6, 2015, 11:00pm

My inclination is option (2). I wouldn't want the maintenance overhead of all the different tables in (3).

You probably won't need to partition if you cluster by date. If this data is not read a whole lot, you should also consider compression: if you very rarely read the data, page compression, otherwise row compression.

WebKill · July 6, 2015, 11:10pm

Thanks, that is kind of what I was leaning toward, splits the data load into two tables, while keeping common format.

gbritton · July 7, 2015, 12:32pm

Given your scenario, I would consider a COLUMNSTORE index. Use partitioning to add new data and switch it into the main table.