SQLTeam.com | Weblogs | Forums

Using SELECT OVER provides results > 100%


#1

I am attempting to show the percentage of total by groups of data. I am using the following query, and the results have some of the groups at 100% but others show a variance of +- .05%. I have attempted to use a variety of casts (see the commented lines), but the results for commented and uncommented lines come out the same; some of the groups come out to more than 100%. I need all the groups to come out to 100% with a variance of no mor than +- .000009. Is there a way to cast the values to make this work to that level of precision, or is there a different syntax that could be used? Here is the query:

SELECT obsvts, environment, farmname, servername, instname, dbname, maxcpu, maxpio, maxmem, totcpu, totpio, avgmem, potcpu, potpio, potmem, summedweights
-- ,cast(summedweights as decimal(25,13)) / sum(cast(summedweights as decimal(25,13))) over(PARTITION BY obsvts,farmname) AS potattribution
-- ,cast(summedweights as decimal(25,13)) / SUM(summedweights) over(PARTITION BY obsvts,farmname) AS wrcpot
-- ,cast(summedweights as decimal(25,13)) / sum(cast(summedweights as decimal(25,13))) over(PARTITION BY obsvts,farmname) AS wrcpot
,cast(summedweights as float) / sum(cast(summedweights as a)) over(PARTITION BY obsvts,farmname) AS wrcpot


#2

Apologies, typo in that original query. Here is the corrected query:

SELECT obsvts, environment, farmname, servername, instname, dbname, maxcpu, maxpio, maxmem, totcpu, totpio, avgmem, potcpu, potpio, potmem, summedweights
-- ,cast(summedweights as decimal(25,13)) / sum(cast(summedweights as decimal(25,13))) over(PARTITION BY obsvts,farmname) AS potattribution
-- ,cast(summedweights as decimal(25,13)) / SUM(summedweights) over(PARTITION BY obsvts,farmname) AS wrcpot
-- ,cast(summedweights as decimal(25,13)) / sum(cast(summedweights as decimal(25,13))) over(PARTITION BY obsvts,farmname) AS wrcpot
,cast(summedweights as float) / sum(cast(summedweights as float)) over(PARTITION BY obsvts,farmname) AS wrcpot
FROM consumedMAXPOT


#3

FLOAT is not an exact representation, so I would expect some difference, but not of 0.05%. If you use decimal (25,13), which is exact representation (subject to the scale), the numerator and denominator would be exact, but the division may be rounded off. Still I would not expect 0.05% difference.

So, I don't know why you are seeing such a large difference.

I am guessing that you have a large number of rows making it difficult to narrow down the problem. I don't know of another way of debugging it other than narrowing it down to a reasonably sized test set. You might try to narrow down the sets by adding some type of WHERE clauses.

One thought is to do the casting after the summation

cast(summedweights as float) / cast(sum(summedweights) as float) 
	over(PARTITION BY obsvts,farmname) AS wrcpot

What is the original data type?


#4

Here are the exact numbers I am getting, perhaps I did not convert them correctly in my head. I get the exact same percentage values if I use the commented decimal line or float. (1 = 100%). You are correct, there are a lot of rows that make up the raw data. Not sure it helps, but here are the actual values I get, which add up to the 1.053455 seen in the list below:

This is one group - this adds up to 1.053455
0.000154
0.011025
0.120052
0.000135
0.061616
0.0855
0.005689
0.261304
0.274239
0.047767
0.001174
0.047767
0.137033

These are all the groups - Some are exactly 100% (1), others are within tolerance, but 3 are not (marked with **)
1.000001
1.000001
0.999999
0.999999
1
1
1
1
1.053455
1.053455
1.000001
1.000001
1.01009
0.999999
0.999999
1
1
1
1
1
1
1
1
1
1
0.999999
1
1


#5

Can you post the actual values of summedweights (rather than ratios) for the group that adds up to 1.053455


#6

Here are the values that are being used for the group that is 1.053455 pct of total.

0.003511881
0.184788148
0.822442137
0.004116138
0.000462302
0.044534575
0.766782875
0.017060228
0.000029964
0.033274378
0.359980116
0.362633483
0.255815687
0.000405123
0.144162966


#7

I am suspecting that some of the data types you are using are truncating the results and that is causing the round-offs. Using your data, I created a test table (with FLOAT as the data type), and then calculated the percentages. Then added them up, and they add up to exactly 1. See below; you can copy this to an SSMS window and run it to see what it does.

CREATE TABLE #tmp (val FLOAT);
GO
INSERT INTO #tmp VALUES
(0.003511881),
(0.184788148),
(0.822442137),
(0.004116138),
(0.000462302),
(0.044534575),
(0.766782875),
(0.017060228),
(0.000029964),
(0.033274378),
(0.359980116),
(0.362633483),
(0.255815687),
(0.000405123),
(0.144162966);
GO

-- fractional values
SELECT val/SUM(val) OVER() FROM #tmp

-- the sum adds up to exactly one.
SELECT SUM(r) FROM
(
	SELECT val/SUM(val) OVER() AS r FROM #tmp
)s;

#8

Thanks for doing that test. It's good to know that it is possible for it to evaluate to 1. I will try changing the types earlier in the process to see if the rounding error goes away. Might take some time to test (a day), but will report back if I was successful (and why). If you think of anything else in the meantime, feel free to post any suggestions!


#9

Based on your test, I kept walking back through my SQL code. I eventually was able to determine that the over() function was providing all percent of totals (pot) sum = 1. I had thought that I was not manipulating the percent of total, but the pot was used later in my procedure within a group by function. That is where I lost the precision. I am not certain how I am going to fix that issue, but once identified, I have a shot. Thank you for all the help, without the proof (test) you provided, I am certain I would still be spinning my wheels on this one.