SQLTeam.com | Weblogs | Forums

Aggregating Data with a Maximum Limit on an aggregated value


#1

I need to create an export process that aggregates our records based on common characteristics, but the aggregation also much have a total cost that is less than an arbitrary amount assigned by a variable. For example, suppose that you have:

1 Blue $1
2 Blue $1
3 Red $1
4 Red $2

And the limit is $2. The desired output is:

1 Blue $2
2 Red $2
3 Red $1

Using the standard COLOR, MAX(Cost) GROUP BY COLOR type of query will yield the first line, but the second line of the output above will be $3 and the last line will be absent.

Suppose further that the totalcost is actually a function of unitcost and quantity. So, the supplied data will really be:

1 Blue 2 $2
2 Blue 3 $2
3 Red 5 $4
4 Red 6 $4

Suppose the limit is $10. The desired output is:
1 Blue 5 $2 $10
2 Red 2 $4 $8
3 Red 2 $4 $8
4 Red 2 $4 $8
5 Red 2 $4 $8
6 Red 2 $4 $8
7 Red 1 $4 $4

Any suggestions on an elegant way to automate this? I need my data "lumpy" but not too "lumpy".

Thanks!


#2

I get the part, where you sum the quantities on the blue lines, but please explain how you determine that the 2 red lines
3 Red 5 $4
4 Red 6 $4
would become 6 lines:
2 Red 2 $4 $8
3 Red 2 $4 $8
4 Red 2 $4 $8
5 Red 2 $4 $8
6 Red 2 $4 $8
7 Red 1 $4 $4


#3

--Note: dbo.tally is a "standard" tally table with just seq. numbers from 0 to 100,000 or 1M (typically).

DECLARE @cost_limit smallmoney
SET @cost_limit = $10

SELECT color, quantity, unitcost, quantity * unitcost AS extended_cost
FROM (
    SELECT color, unitcost, SUM(quantity) AS total_quantity
    FROM (
        SELECT 1 AS id, 'Blue' AS color, 2 AS quantity, CAST($2 AS smallmoney) AS unitcost UNION ALL
        SELECT 2, 'Blue', 3, $2 UNION ALL
        SELECT 3, 'Red', 5, $4 UNION ALL
        SELECT 4, 'Red', 6, $4
    ) AS test_data
    GROUP BY color, unitcost
) AS totals
CROSS APPLY (
    SELECT FLOOR(@cost_limit / unitcost) AS max_qty_per_row
) AS aan1 --assign_alias_names
INNER JOIN dbo.tally t ON
    t.tally BETWEEN 1 AND CEILING(total_quantity / max_qty_per_row)
CROSS APPLY (
    SELECT CASE WHEN total_quantity - (t.tally * max_qty_per_row) >= 0 THEN max_qty_per_row 
         ELSE total_quantity - ((t.tally -1) * max_qty_per_row) END AS quantity
) AS aan2
ORDER BY color, unitcost, t.tally

#4

Hi bitsmed,

The idea is that the rule is that no single returned result can have a total value over $10. In this case, we have a total of 11 units at $4 each, and 2*$4 < 10 < 3*$4. So, the desired result will figure out that math and then return a number of lines sufficient to capture all of the units reports. In this case, (2 * $4) * 5 + (1 * $4) = 11 * $4. The $8 is now a new calculated value of the product of the unit cost and quantity.

The challenge that I am facing is the tool I need to use gets weird when records get too large. It's a model that does projections based on budget and when people report a great many things as a line item (a practice I have tried to put a stop to with varying levels of success), it can result in the model never having enough money to replace the items. On the flip side, there is a minimum order number as well, but we are less concerns about meeting that than having EVERYTHING in one line item with a total cost that exceeds the budget dramatically. So, for reporting, I am trying to push for itemized data, but for modeling I have to smartly roll it up -but not too big for any given item. I also don't want to do this process manually, hence... :slight_smile:

Hope that clarifies. :slight_smile:


#5

Thank you, Scott! I'll have to take a moment to digest that.


#6

Thank you for the explanation.


#7

Hi Scott,

I think I am starting to see what is going on here. This is the type of SQL that I am almost at, but still working into. Thank you for a example to learn from!

For the tally table, currently the database that I am in does not have one. Overall, is it better to create a fixed standard table (e.g. with one the many methods floating around the net) or generate it dynamically for the query when needed? I think that is the only piece that I am missing from this.

Thanks!

-Edit: Yeow! That does exactly what I am looking for. I did notice that the unitcost can only be as small as the maximum unit cost across all of the distinct types. If it is less than this (for example, $3 then it throws an error: Divide by zero error encountered). Any recommendations on how to handle this? We have a very wide range of unit costs. Run multiple times where records are fed into it with a range of unit costs? Some other magic?


#8

The inline tally table, using CTEs, typically performs slightly better. I can't post an example because my filter at work thinks it's some type of "sql injection", but Itzik Ben-Gan first demonstrated the technique, and there should be examples on the web of a CTE tally table.


#9

Hi Scott,

I have been trying to implement this with my dataset and was wondering if you had recommendations. I have a couple of extremely large records that I need to include that have unitcosts above the cost_limit value and instead of ended up with a divide by zero failure, I tried to enhance it so that I get the output that I am looking for by applying this to a part of the record set and then unioning an aggregation query for the remainder of the records. Unfortunately, when I get to the end my numbers are off for the total tally and total cost. What is going sideways?

Thank you for all your help so far.

Here's the code:

DECLARE @cost_limit money
SET @cost_limit = $50000000

--CREDIT: ScottPletcher of forums.sqlteam.com
--http://forums.sqlteam.com/users/scottpletcher/activity

SELECT Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
	, quantity, quantity * unitcost * (1 + SoftCost) AS extended_cost
FROM (
	SELECT Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
	, SUM(Quantity) as total_quantity
	from dbo.Inventory
	WHERE UnitCost <= @cost_limit
	group by Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
) AS totals
CROSS APPLY (
    SELECT FLOOR(@cost_limit / unitcost) AS max_qty_per_row
) AS aan1 --assign_alias_names
INNER JOIN dbo.tally t ON
    t.N BETWEEN 1 AND CEILING(total_quantity / max_qty_per_row)
CROSS APPLY (
    SELECT CASE WHEN total_quantity - (t.n * max_qty_per_row) >= 0 THEN max_qty_per_row 
         ELSE total_quantity - ((t.n -1) * max_qty_per_row) END AS quantity
) AS aan2
UNION
	SELECT Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
	, Quantity, Quantity * UnitCost * (1 + SoftCost)as extended_cost
	from dbo.Inventory
	WHERE UnitCost > @cost_limit
ORDER BY Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost

#10

Hard without sample data but maybe this:

DECLARE @cost_limit money
SET @cost_limit = $50000000

--CREDIT: ScottPletcher of forums.sqlteam.com
--http://forums.sqlteam.com/users/scottpletcher/activity

SELECT Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
	, quantity, quantity * unitcost * (1 + SoftCost) AS extended_cost
FROM (
	SELECT Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
	, SUM(Quantity) as total_quantity
	from dbo.Inventory
	WHERE UnitCost <= @cost_limit
	group by Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
) AS totals
CROSS APPLY (
    SELECT FLOOR(@cost_limit / unitcost) AS max_qty_per_row
) AS aan1 --assign_alias_names
INNER JOIN dbo.tally t ON
    t.N BETWEEN 1 AND CASE WHEN max_qty_per_row = 0 THEN 1 ELSE CEILING(total_quantity / max_qty_per_row) END
CROSS APPLY (
    SELECT CASE 
        WHEN max_qty_per_row = 0 THEN unitcost
        WHEN total_quantity - (t.n * max_qty_per_row) >= 0 THEN max_qty_per_row 
        ELSE total_quantity - ((t.n -1) * max_qty_per_row) END AS quantity
) AS aan2
UNION
	SELECT Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost
	, Quantity, Quantity * UnitCost * (1 + SoftCost)as extended_cost
	from dbo.Inventory
	WHERE UnitCost > @cost_limit
ORDER BY Team, ProdType, YearIssued, UnitCost, SoftCost, YRCost

#11

Hi Scott,

Thanks again for taking a second look. It is working better now, with the extra case, but I think that you meant to have THEN quantity instead of unitcost? That was coming up with some errors.

I haven't been able to get it to stop dropping records in the total quantity. As for sample data, here is an example of the set: https://www.hightail.com/download/bXBhb2VDSWVvQnRwdmNUQw

Datatypes for the columns are: Team - nvarchar(5), ProType - numeric(18, 6), YearIssued - date, UnitCost - money, YrCost - smallint, and Quantity - float.

Thank you!