Hello,
We need to split the value of a field (a production cost per item) according to a value key (hours / phase from total hours). We use rounded values and there is an inherent difference between the total value and the sum of the calculated "sub-values".
Example:
Costs/Items
+=====+========+==========+
| ItemId | Total Cost | Total Hours |
+---------+---------------+----------------+
| 7 | 3729 | 147 |
+---------+---------------+----------------+
Hours/Phases
+=====+=======+======+=========+
| Item Id | Phase Id | Hours | Phase cost |
+=====+=======+======+=========+
| 7 | 1 | 23 | 583 |
| 7 | 2 | 31 | 786 |
| 7 | 3 | 4 | 101 |
| 7 | 4 | 52 | 1319 |
| 7 | 5 | 37 | 939 |
+=====+=======+======+=========+
| "recomposed" value | 3728 |
+=====+=======+======+=========+
The Difference=1 and it needs to be added to one of the "sub-values" to balance the cost split.
Is there a way to distribute that difference to one (the first, the last or the one with the greatest value) of the "sub-values" ?
Thank you, Daniel
(sorry, I don't know hot to insert aligned tables )
Errrr, thank you, I think ....
I don’t understand what this answer is about.
We need an idea on how to distribute the difference over the calculated values.
Thank you, Daniel
Here's code to add the difference(s) to the highest existing value (but if there's a tie for highest, the diff would add to both, causing totals to be off: if that's possible, let me know).
Note that I would not have had time to work on this without usable sample data. I'm not trying to be difficult or harsh, but I help up to a dozen people a day online with SQL qs, I just don't have time to create sample data for everyone. I can write the SQL far more quickly than I can do all the code to create sample data.
;WITH cte_totals AS (
SELECT HP.ItemId, SUM(HP.Hours) AS HP_TotalHours, SUM(HP.[Phasecost]) AS Total_Phasecost,
MAX(HP.Hours) AS Max_Hours,
MAX(CI.[TotalCost]) AS CI_TotalCost, MAX(CI.[TotalHours]) AS CI_TotalHours,
MAX(CI.[TotalCost]) - SUM(HP.Phasecost) AS Phase_Diff,
MAX(CI.[TotalHours]) - SUM(HP.Hours) AS Hours_Diff
FROM [Hours/Phases] HP
CROSS APPLY (
SELECT [TotalCost], [TotalHours]
FROM [Costs/Items]
) AS CI
GROUP BY HP.ItemId
)
SELECT HP.ItemId, HP.PhaseId,
CASE WHEN HP.Hours = CT.Max_Hours THEN HP.Hours + Hours_Diff
ELSE HP.Hours END AS Hours,
CASE WHEN HP.Hours = CT.Max_Hours THEN HP.[Phasecost] + Phase_Diff
ELSE HP.Phasecost END AS [Phase cost]
, HP.Phasecost AS original_phase_cost /*just for verification, remove after testing*/
FROM [Hours/Phases] HP
CROSS JOIN cte_totals CT
Hi Scott,
Thank you for the elegant solution.
Actually it IS possible to have a tie for the highest values, or have the values be evenly distributed ( like 10 hours = 2+2+2+2+2 hrs... ). In can be done somehow programatically in three steps (like splitting values, reassembling the results by summing them and then distribute the difference) ... a brute-force approach though
Sorry for the lack of data, I really didn't understand the helping hand @harish gave .... (thank you again harish).
Daniel
Probably the easiest way is to combine hours and phase id for the MAX(), and then split them back out afterward. The code below does that. It will not perform as well as the first query but should perform than methods that require any additional I/O / table queries.
/* first, let's set the the last phase to the same highest hours as the 4th phase. */
UPDATE [Hours/Phases] SET Hours = 52 WHERE PhaseId = 5; SELECT * FROM [Hours/Phases];
/* then run the new query: note that only phase 5 values are adjusted,
and that the hours are adjusted too, since I didn't adjust the CI row
when I increased the total hours in HP */
/* note also you can run the CTE query by itself so you can see the preliminary query result */
;WITH Cte_Totals AS (
SELECT HP.ItemId, SUM(HP.Hours) AS HP_TotalHours,
SUM(HP.[Phasecost]) AS Total_Phasecost,
MAX(CAST(HP.Hours AS bigint) * 10000000000 + HP.PhaseId) AS Max_Hours_with_PhaseId,
MAX(CI.[TotalCost]) AS CI_TotalCost, MAX(CI.[TotalHours]) AS CI_TotalHours,
MAX(CI.[TotalCost]) - SUM(HP.Phasecost) AS Phase_Diff,
MAX(CI.[TotalHours]) - SUM(HP.Hours) AS Hours_Diff
FROM [Hours/Phases] HP
CROSS APPLY (
SELECT CI.[TotalCost], CI.[TotalHours]
FROM [Costs/Items] CI
WHERE CI.ItemId = HP.ItemId
) AS CI
GROUP BY HP.ItemId
)
SELECT HP.ItemId, HP.PhaseId,
CASE WHEN HP.Hours = HP_calcs1.Max_Hours AND HP.PhaseId = HP_calcs1.Max_PhaseId
THEN HP.Hours + Hours_Diff ELSE HP.Hours END AS Hours,
CASE WHEN HP.Hours = HP_calcs1.Max_Hours AND HP.PhaseId = HP_calcs1.Max_PhaseId
THEN HP.[Phasecost] + Phase_Diff ELSE HP.Phasecost END AS [Phase cost]
, HP.Phasecost AS original_phase_cost /*just for verification, remove after testing*/
, HP.Hours AS original_hours /*just for verification, remove after testing*/
FROM [Hours/Phases] HP
CROSS JOIN cte_totals CT
CROSS APPLY (
SELECT Max_Hours_with_PhaseId / 10000000000 AS Max_Hours,
Max_Hours_with_PhaseId % 10000000000 AS Max_PhaseId
) AS HP_calcs1
Assuming you wanted to adjust the existing Phasecost value. You can make use of the window function to calculate total Phasecost "SUM(hp.Phasecost) OVER(PARTITION BY hp.ItemId)" and compare with your original TotalCost, if there is a difference, adjust the different to the line with highest PhaseCost value "ROW_NUMBER() OVER (PARTITION BY hp.ItemId ORDER BY Hours DESC) = 1"
SELECT
hp.ItemId,
hp.PhaseId,
hp.Hours,
hp.Phasecost,
hp.Phasecost
+ CASE WHEN SUM(hp.Phasecost) OVER(PARTITION BY hp.ItemId) <> ci.TotalCost
AND ROW_NUMBER() OVER (PARTITION BY hp.ItemId ORDER BY Hours DESC) = 1
THEN ci.TotalCost - SUM(hp.Phasecost) OVER(PARTITION BY hp.ItemId)
ELSE 0
END AS PhasecostAdjusted
FROM [Costs/Items] AS ci
INNER JOIN [Hours/Phases] AS hp
ON ci.ItemId = hp.ItemId
ORDER BY hp.ItemId, hp.PhaseId
Or you can apply similar technic when you are inserting to the [Hours/Phases] table. Then you are skip this step as you are getting the data correct in the first place.
And here it is.
SELECT hp.ItemId, hp.PhaseId, hp.Hours,
CONVERT(int, ci.TotalCost * 1.0 * hp.Hours / ci.TotalHours)
+
CASE WHEN SUM(CONVERT(INT, ci.TotalCost * 1.0 * hp.Hours / ci.TotalHours)) OVER(PARTITION BY hp.ItemId) <> ci.TotalCost
AND ROW_NUMBER() OVER (PARTITION BY hp.ItemId ORDER BY Hours desc) = 1
THEN ci.TotalCost - SUM(hp.Phasecost) OVER(PARTITION BY hp.ItemId)
ELSE 0
END AS PhasecostCorrected
FROM [Costs/Items] AS ci
INNER JOIN [Hours/Phases] AS hp
ON ci.ItemId = hp.ItemId
ORDER BY hp.ItemId, hp.PhaseId