I have a bunch of data sources that I need to get into a database, there are about 15 common fields that they all share, but then there are also about 15 fields that are different (one data source may have a couple extra fields, another two sources may have 4 extra fields and so on). Furthermore I want to set this up with the future in mind, as additional data sources are identified there will be more unique fields added in. The other consideration is the amount of data, there will be an estimated million rows added per week, with that number to increase over time as well.
The way I see in there are a few options:
- One table with a lot of columns and add new columns to it as needed
- Two tables, one with common data and one with extra field data linked together through a PK unique ID #
- Four or Five tables based on data source
I don't know enough about optimization to really speak to this, I just thought up those different ways, and there may be more, I just don't know. I know how to tailor the indexing to the need, and setting up partitioning perhaps by month would be good, but I am looking for suggestions on optimal structure given the variable data fields and size of import.
Ultimately this will be used as a workflow and reporting tool, I will put together a little program to pull data, or display records that they can comment on.