Which of these two table structures is more efficient?

SergioM · May 13, 2016, 10:25pm

I need to create a table to record many different types of fees for a given order. The issue is that there are about 10 types of fees (and these types may increase outside of my ability to control them) however they are mostly 0.00.

The linear table structure would look like this:

A column is created for each fee & fee currency & the table would just extend outwards. There are no joins & is simple but I store a lot of data which is meaningless.

OrderID	Date	Buyer	Total	TotalCurrency	Fee	FeeCurrency	Giftwrap	GiftwrapCurrency	ShippingFee	ShippingFeeCurrency
123456	42501	John Smith	500	USD	20	USD	0	USD	2	USD
987654	42501	Joe Blow	499	GBP	19	GBP	1	GBP	0	GBP

The dynamic table structure would look like this:

It has two tables: Order & OrderCharge. The order charge only stores the values that are greater than 0 and all missing fields are queried as $0. New charges can be added without an issue.

OrderID	Date	Buyer
123456	42501	John Smith
987654	42501	Joe Blow

OrderID	FeeType	FeeValue	FeeCurrency
123456	Total	500	USD
123456	Fee	20	USD
123456	ShippingFee	2	USD
987654	Total	499	GBP
987654	Fee	19	GBP
987654	Giftwrap	1	GBP

The part that I'm worried about - which I don't know - is if the extra workload added by joining/pivoting a table is greater than the amount offset by not including useless data.
Is this actually better? Or am I making it both more complicated & worse?

stephen_hendricks · May 13, 2016, 10:54pm

Let me ask you this, What is the plan when the number of different Fees increases dramatically? Add additional columns?
Conceptually, you have two types of entities: Orders and Fees. You would do well to create the appropriate relations and join the two together. The current layout isn't even in first normal form. I'm not above denormalizing when it can benefit things but with the proper indexing and DRI, joining two tables together shouldn't impose a burden. If you don't do it now, you'll regret it later when things are more complex and the data sizes are larger. Do it now!!!

scarela · May 14, 2016, 7:32am

I would recommend you to create a sepate table only for the fee types that way you don't have to worry about new types being created.

Maybe something like this

ID FeeDrecrip
1 Fee
2 ShippingFee

SergioM · May 17, 2016, 10:27pm

Let me ask you this, What is the plan when the number of different Fees increases dramatically? Add additional columns?

Pretty much. It's the quick & dirty approach and I'm not even so opposed in terms of general ugliness. However it's going to be a large table & I'm concerned about performance as I continue to muck it up.

If you don't do it now, you'll regret it later when things are more complex and the data sizes are larger. Do it now!!!

Makes sense to me. I am concerned about it's impact on performance, but the impact may potentially be worse if I don't do it. Thanks

Kristen · May 18, 2016, 4:54am

Doesn't feel that way to me. With a sub table new Fee Types can be added at will, by the user. For multiple columns the APP / Report / etc. have to be changed with each addition.

I am doubting that multiple columns will make a difference to performance, compared to a sub-table, but a Test would be good to prove the point, one way or the other

When you get to the point of having lots of columns, i.e. lots of Fee Types, will most of the rows of data use most of the columns, or will the "average" row only have data in a few Fee columns? You say:

so I assume that most of the Fee Types will be unused.

If all Fee Type columns will be used on All rows then that is definitely a scenario I would want to test. If the column usage will be sparse I can't see multiple columns winning.

Indexing anything is going to be a nightmare too. What if you want all rows that have ShippingFee > 2 ... you could have an Index on the ShippingFee column - but then you need an index on every Fee column, and the performance of inserting / updating a row will be seriously adversely effected.

SqlNightOwl · May 19, 2016, 2:25pm

Kristen has the correct solution - it depends. A hybrid approach may be appropriate. If all orders have common fees (like shipping, handling etc.) then I'd add those as non-nullable columns with a default of 0. Then you could have a calculated column for the total standard fees. Then you can add an auxiliary table for OrderFee that follows your 1:n pattern of storing Fee Type & Fee > 0. you'd need a left join for Orders that don't have additional fees. As always, test - test - test.

SergioM · May 31, 2016, 6:42pm

I never did answer this, but I wanted to come back to say thanks. These answers were really helpful not only in figuring out how to address the issue, but also in figuring out what points should be considered/weighed.