Issues with DB Design - Still struggling with it

simflex1 · November 21, 2022, 9:58pm

Greetings again experts,

The following tables with exception of the lookup table were designed by some employee who I am told has left company. I designed the lookup table.

I have attached sample data for each table:

UserData
	WaterAccountNo decimal(15, 1) NOT NULL,
	UserID int PK NOT NULL, 
	OwnerID varchar(8) NOT NULL,
	Applicant varchar(250) NOT NULL,
	GPFPerToilet varchar(30) NOT NULL, 
	RebatePerToilet varchar(30) NOT NULL,
	TotalRebate money NOT NULL


AddressData
	Address varchar(50) NOT NULL,
	City varchar(35) NOT NULL,
	State varchar(3) NOT NULL,
	Zip varchar(10) NOT NULL,
	YearBuilt smallint NULL,
	Toilets tinyint NULL,
	Replaced128 tinyint NOT NULL,
	Replaced160 tinyint NOT NULL,
	WaterAcctNo decimal(15, 1) NULL
	Pin varchar(18) NULL

Lookup table
        ToiletID int PK NOT NULL,  
	GPFPerToilet varchar(30) NOT NULL, 
	RebatePerToilet varchar(30) NOT NULL


Sample data for UserData table:
	GPFPerToilet	RebatePerToilet	   TotalRebate
	1.60,1.60,1.60	50.00,50.00,50.00	150.00

Sample data for AddressData table:
  Address              Toilets    Replaced128            Replaced160
  123 Jones Dr           4           2                     0


Sample data for Lookup table:

ToiletID              GPFPerToilet               RebatePerToilet
1                        1.28                       60.00
2                        1.60                       50.00
3                        1.80                       50.00

Here are our concerns.

1, The userData table has three fields that we have some doubts about their designs:

GPFPerToilet ,
RebatePerToilet,
TotalRebate|

I have also posted sample data for each of these fields.

I have also created Lookup table with three fieldnames.
ToiletID
GPFPerToilet ,
RebatePerToilet,

Sample data also posted.

How do I fix the userData table with those fieldnames such that Toilet information stays with lookup table and ToiletID becomes foreign key to UserData table?

Our hope is that on UserData table, there will be one row of records for each GPFPerToilet as opposed to storing them on same row with comma delimitation.

Then AddressData table has three fieldnames that we believe can be designed better.

Those are Toilets (this field contains the total number of toilets in a particular address)
Replaced128 (Those shows how many of those toilets have been replaced and are of 1.28 GPFPerToilet)

Replaced160 (Those shows how many of those toilets have been replaced and are of 1.60 GPFPerToilet)

This got up stumped. Any assistance is greatly appreciated.

Please, please let me know if you need additional information.

ScottPletcher · November 21, 2022, 10:51pm

I'm confused by their "design" myself.

All tables contain "data", thus "data" in the table name is redundant.

You really have three tables (best practice is plural table names):
Users
Addresses
Toilets

In Users, I'm unsure of the difference between Owner and Applicant: would the Owner be the applicant? The UserID, ok. But why is OwnerID varchar? If you really need varchar, call it OwnerCode, say, rather than ID. Yes, you are correct, you should absolutely not have comma-delimited data here, period.

In Addresses, I'd use smallint for ToiletCount, even if a tinyint currently works. It's not impossible to imagine an address with 300 toilets (hotel, etc.), likewise for Replaced then, for consistency.

In the Toilets ("Lookup") table, "PerToilet" doesn't apply, right? It's a specific toilet (if I understand correctly). So, just GPF, decimal(3, 2) and Rebate decimal(6, 2).

simflex1 · November 21, 2022, 11:33pm

Thank you so much Scott for your prompt response. I really appreciate it.

Let me start from the top. You said:
You really have three tables (best practice is plural table names):
Users
Addresses
Toilets

I completely agree and will make the changes as recommended.
I need to point out here that there are several tables these three are the ones with toilet stuff that has been confusing us a lot.

would the Owner be the applicant? The UserID, ok. But why is Owner ID varchar?

Great question. I asked the users exactly same question since we can't ask the individual that designed this and there is no documentation.

We were told yes because you would have bought the property and would have decided to replace your current toilets and therefore need rebate to do so.

So, you, the property owner is applying for rebates (applicant) to replace the toilets you currently have that may be making you lose more water than necessary.

I am not exactly sure why varchar was used as data type.

Even worse, OwnerID was also used as primary key which doesn't make sense to me because once you sell your house, a new owner, as I understand it, does not inherit the current OwnerID and our queries proved this.

This is why we are seeking help from experts like you.

So, you are saying having Toilets (ToiletCount), Replaced128 (Replaced128Count) and Replaced160 (Replaced160Count) should be left as they are on the Addresses table - just change the name?

If so, let's assume from the sample data that I posted for Addresses table, that an owner has 4 toiletCounts, replaced 2 and those two are for 1.28 (Replaced128), what happens if the owner decides to come back and replace the remaining two, do we just update the Addresses table and update Replaced160 from 0 to 2?

Finally, still not sure how to fix these:

GPFPerToilet (this should be getting its values from lookup table going forward)

RebatePerToilet (This should also get the value from lookup table. I am assuming both should have the value of ToiletID. This way, a joined query between lookup table and Users table will display the GPTPerToilets per owner.

Just not sure how to address the comma delimited character values.

Thanks again for your help sir.

ScottPletcher · November 21, 2022, 11:55pm

Really you should number multiple qs so one can answer w/o repeating the whole q.

So, you are saying having Toilets (ToiletCount), Replaced128 (Replaced128Count) and Replaced160 (Replaced160Count) should be left as they are on the Addresses table - just change the name? <<

Correct.

what happens if the owner decides to come back and replace the remaining two, do we just update the Addresses table and update Replaced160 from 0 to 2? <<

Actually you'll update the Toilets table for the specific one(s) they are replacing. Then you'll SUM up the GPFs from the Toilets table to UPDATE the addresses table. This is what is technically called "derived data", since it's derived from an existing table and not updated separately.

Finally, still not sure how to fix these:
GPFPerToilet (this should be getting its values from lookup table going forward)
RebatePerToilet (This should also get the value from lookup table. I am assuming both should have the value of ToiletID. This way, a joined query between lookup table and Users table will display the GPTPerToilets per owner.
Just not sure how to address the comma delimited character values. <<

The GPFPerToilet goes away; it is replaced by GPF in the Toilets table. The RebatePerToilet as well. If you need to get comma-delimited data, you can generate it from the Toilets table.

simflex1 · November 22, 2022, 1:15am

Really you should number multiple qs so one can answer w/o repeating the whole q.

Sorry about that.

I am feeling much better about this now.

Please bear with me a bit longer to get be through sorting out these confusions.

You said GPFPerToilet, RebatePerToilet and I suppose TotalRebate field names all go away along with their data (those comma delimited values)?

2, Still confused about the Replaced128 and Replaced128. Probably due to my incomplete explanations. You said:

Actually you'll update the Toilets table for the specific one(s) they are replacing. Then you'll SUM up the GPFs from the Toilets table to UPDATE the addresses table. This is what is technically called "derived data", since it's derived from an existing table and not updated separately.

The Toilets (lookup) table that I created but not yet used due to the confusions with this DB design, currently has two GPFs (1.28 and 1.60).

These values are the the values the applicants would have to select one or both to replace their current toilets.

So, I am not sure how the lookup table would be updated. The applicant or home owner will indicate how many toilets s/he has and which GPF(s) to upgrade to. The GPF the owner upgrades to is what is recorded in Users table. That's why I am a bit concerned about the name Replaced128 and Replace160. If another GPF is selected say 1.80 from the lookup table, do we then create another field name called Replaced180Count in Users table and can that be updated in lookup table?

Many thanks again for your patience and assistance.

ScottPletcher · November 22, 2022, 2:57pm

I would say, no, don't add hard-coded column names.

In the Toilets table, the GPF are recorded for each Toilet at that address. So, for example:

Toilets ( WaterAcctNo, ToiletID, GPF decimal(3, 2), Rebate decimal(9, 2) )
12345678901234.1, 1, 1.6, 50
12345678901234.1, 2, 1.6, 50
12345678901234.1, 3, 1.8, 40

SELECT GPF, COUNT(*) AS TolietCount, SUM(Rebate) AS TotalRebate
WHERE WaterAcctNo = 12345678901234.1,

The specific GPF(s) in use at an acct, and only those GPFs, will list for each acct.

simflex1 · November 22, 2022, 3:30pm

But you suggested the name of Toilets for Lookup table which per your design below, has ToietID int PK autoincrement, GPF and Rebate.

So, this design -> Toilets ( WaterAcctNo, ToiletID, GPF decimal(3, 2), Rebate decimal(9, 2) ) is a bit confusing unless you wanted me to change Users table to Toilets table.

ScottPletcher · November 22, 2022, 3:45pm

I never specifically said anything about autoincrement, although it could be.

The Toilets need to be linked back to a specific address / location. I thought WaterAcctNo was how all these tables were linked. When you have repeating values in a design for a table, you separate those into a separate table and use a key to link it back to the original table (part of a process called "normalization"). That's the situation you had, to that's what I'm doing here.

I did change Rebate to decimal(9, 2) rather than (6, 2), since they both take up the amount of disk space, but naturally you can change that back if you like.

simflex1 · November 22, 2022, 5:08pm

You are right. I added that; although it is probably not needed.

I did change Rebate to decimal(9, 2) rather than (6, 2), since they both take up the amount of disk space,

I get this as well. Thank you.

I thought WaterAcctNo was how all these tables were linked

Oooh ok; I think I am finally with you now.

In this case, there is no need for ToiletID.

So, my lookup table (Toilets) would look like this:

CREATE TABLE Toilets (
WaterAcctNo decimal(9, 2),
GPF decimal(3, 2),
Rebate decimal(9, 2))

When a user receives rebate applications, the user selects WaterAccountNo, GPF and rebate from Toilets table.
Assuming that the owner (applicant) wants to replace two toilets, one 1.28GPF and the other 1.80GPF, the user entering this information (INTO USERS table), will first select 1.28 GPF and associated values like waterAcctNo and Rebate.

Then does it again for the second GPF?

This means that the applicant's information will be entered twice to the USERS table?

Is this an accurate assumption?

Finally, sir, how would you tweak the Users and Addresses tables?

Many, many thanks for your patience and assistance.

ScottPletcher · November 22, 2022, 6:11pm

No. You'll still want a unique ID for each Toilet.

And the individual toilet info never goes into Users, only into Toilets. Only summary data goes into Addresses, and, if you want to do it, summary data into Users, although that could be trickier to maintain if the same User had multiple Addresses.

simflex1 · November 22, 2022, 10:15pm

Ok, I think you have given me enough to get started with redesigning the DB.

I will post a new thread if I get stuck somewhere.

You have been very kind with your expertise and generous with your time.

Thank you very much.