Could someone provide me advice on what is the best way to design a growing database.
Should I create Index on every table.
And what is your advice on some of the best practice and standard to go by ? For example after I design it and create it, how does an expert know or not know if someone good or expert has design it. What would be the tell tale signs and how do I avoid those pitfalls ? Please advise !
Thanks, that is a lot of information in there.
Will take a look at it.
The single most important thing, by far, is to do a data design before you do a database design.
That is, a true logical data design before a physical db design.
You do that by:
- Thinking of "entities", not "tables"
- NOT automatically slapping an identity on every entity
- Identifying the data key(s) for each entity
- Going thru the normalization steps, in order: 1NF, 2NF, 3NF
- Do a thorough analysis of the data requirements (only), without worrying about the physical side at all initially
- Only after you've finalized the logical design, convert it to a physical design. That's when you add add the compromises necessary to get a well-performing physical design, such as identities, indexes, etc..
Great thanks for the information.
Do you have good documentation on 1st to 3rd normal form that I could look at ?
This is pretty good. The first few times you do this, it will take a while, but you'll get better at it over time.
Thanks for the information but why entities and not tables, and what identities and data key are you referring to, and how does the data requirement affects the design ?
I thought we had to do it with related tables, for example an entity could be related or categories by the same table but their entity are slightly different.
If you think in terms of tables you have entered into actual design.
If you think in terms of entities then you are not forcing yourself into a design corner
An entity is a business "thing" about which you will keep information. It's very similar to a table, except that a single entity can sometimes become more than one table.
For example, Customer would be an entity, Order, Line Item, and so on.
An identity column is an autonumbering column in SQL Server that automatically increases (typically) for every row you (attempt to) INSERT. It's an easy mechanism to provide a unique number for each row of a table, that's all really.
Thanks all for the valuable information.
It’s great. I also really thought database design is really two things.
1- You make sure you can join and answer any queries you have with the data.
2- When you update or upload data you are not doing it redundantly.
If you have a good design you should be able to do these two things. I don’t know if you agree or not.
If not how else can you tell your design is great, what other signs ?
Database design is mostly following the normalization process. That is how you get to #2 you listed.
#1 will take care of itself with a proper design. You will always be able to join the data you need to join if it is related and properly normalized in Entities (and subsequently into tables).
Developers, for one, always want to skip the (logical) design phase and go straight to tables. That's always a bad idea. Your current design should go thru normalization too, period. No, there are no easy and automatic short-cuts. Then again, for most systems, it doesn't take that long anyway. And the process will reveal a lot to you about the data itself.
During requirements gathering you have to talk to the stakeholders in plain language in detail what is this system you want to build. Think of construction analogy, building a house. What kind, how big, where bathroom, where dining room, how many bedrooms, 1 floor 2 floor? Garage. Or a simple tiny house etc
You just dont jump and build. You could but you will pay dearly when you find out your bathroom in wrong place and you have to rip it out etc
Another good one that has helped me explain it is think of a database you want to capture music industry data.artist music albums titles tracks , CD, vinyl, mp3 etc. Try to capture details of such a system. You will learn a lot as you talk it out and discuss it
You "know" your design is "great" when you define what great is and when you fully test it.
I know you want a short cut, everyone wants a short cut, but it just doesn't exist here. Even this requirement:
1- You make sure you can join and answer any queries you have with the data.
doesn't give you a short cut to check the design, since you can't possibly know what new query needs might come up in the next weeks, months and years.
The standard normalization process is the best way to do this, period. With practice, it gets faster. It's like learning anything new, it just takes some time to get used to.
Thank you to Scottpletcher, and all for the very valuable information. I gain a lot from you guys. I will be looking into more of this but do you have case of database examples (links) where I can practice 1st to 3rd normal forms ?
Also to create Index on tables. I know we do so on Primary and Foreign Keys, how about composite keys ? And I read somewhere that we need to create them on filter columns, aggregated columns, and ordered columns. How do we do this and what does this really mean ?
Google.com is your friend
The most important index is the clustered index. That should be chosen very carefully and not just defaulted to an identity only. The clustered index should never be defaulted to, ever, for any table. It's far too important to performance. Always base it on that table's specific needs.
SQL does every strongly favor unique keys. You may need to add an identity column to the end of the index to insure uniqueness. That's perfectly fine.
Some example leading clus index keys are transaction_date, customer_id or sales_region. Again, it depends on how the table is queried and how well the chosen key(s) avoid too much fragmentation.
Great, thanks for the information.
So can a table not have a primary key but a composite key and we would need to index both composite key ? If I am not wrong, I think primary key is automatically cluster index if that is so do I still need to manually index it.
Can we have primary key as index and then also create cluster index on other attributes like transaction date, customer id or sales region on the table.
Why is customer id and sales region need cluster index. I thought is only on attributes that requires search on a range of value, so I think any attributes with numbers, like age, price, etc.
What if a table has many attributes with numbers, like price changes at different periods, does these need cluster index.
Sorry for my Questions, thanks for the valuable information.
Yes, you can have a nonclustered primary key. And, yes, by default SQL makes it clustered.
You should always create the clustered index first. All non-clus indexes have to rebuilt when you add or change a clus key, so it's more efficient to build the clus index first. Also, when you later create the primary key, SQL won't make it clustered, obviously, since the table already has its one allowed clus index. Btw, when dropping indexes, drop all non-clus index first and the clus index last.
CREATE UNIQUE CLUSTERED INDEX <index_name> ON <table_name> ( <col_name(s)> ) WITH ( <index_with_options> ) ON [<filegroup_name>];
ALTER TABLE <table_name> ADD CONSTRAINT <primary_key_constraint_name> PRIMARY KEY NONCLUSTERED /just for clarity and/or in case the pk ever gets build first/ ( <col_name(s)> ) WITH ( <index_with_options> ) ON [<filegroup_name>];
Alright thanks for the information and SQL example. In a table I have no primary key or unique attributes can I create index, not cluster index on the both or 3 or 4, etc, composite attributes ?
And I can think a table doesn’t need primary key, but I read that is good to have, I feel why create a primary key when is really adding attributes that isn’t there creating more work. I don’t think I want to create primary key when there isn’t one. I’ll just have the composite key as primary key and index all the composite key. What’s your advise ?
You don't have to have a primary key. You can also wait and add one later.
The big thing is to identity the data key to cluster the data on.
Look at the table columns and how you query the table. Make the clus key the best column(s) that identify the data and give good query performance.
To get more specific than that, I'd have to look at actual data. Again, there's no easy cookie-cutter answer here that works for (almost) any table. Most tables have their own unique considerations.