SQLTeam.com | Weblogs | Forums

My SUM field is duplicating!


#1

For some reason I am unable to figure out - my "ip.Maintenance_Revenue_USD__c" field is duplicating, i.e. if there is one record with $630 it will multiply it by two or three times - no idea why!

I am trying to SUM up all the values for Maintenance Revenue in the field where they are Active & Customer for the Product Line.

CODE:-

select distinct

[Softrax ID],
[Account ID],
[Owner],
[Owner Role],
[Account Owner LOB],
[Account Name],
[Account Record Type],
[Class],
[Primary Address Line 1],
[Primary City],
[Primary State],
[Primary County],
[Primary Postal Code],
[Primary Country],
[Region],
[Sub-Region],
[D&B SIC Code 1],
[D&B SIC Code Description],
Ann_Rev_USD as [D&B Annual Revenue],
[D&B Employees],
[SIC Code 1],
[SIC Description],
[DUNS],
[DUDUNS],
[Parent/HQ DUNS],
[GUDUNS],
[Maintenance Status],
[Industry],
[Sub-Industry],
[Revenue Range],
[Employee Range],
[Product Line],
--[SKU],
--[SKU Description],
--[SKU Class],
[Maintenance Revenue USD],
--[Version],
[# of Licenses] -- max
--[Active Flag]--,
--[MBM],
--[Maintenance Start Date],
--[Maintenance Expiry Date]

from (

select
a.SofTrax_ID__c as [Softrax ID],
a.Unique_Id__c as [Account ID],
uo.name [Owner],
uo.Role_Name__c as [Owner Role],
uo.user_lob__c as [Account Owner LOB],
a.Name as [Account Name],
rt.name [Account Record Type],
a.Class__c as [Class],
a.Primary_Address_Line_1__c as [Primary Address Line 1],
a.Primary_City__c as [Primary City],
a.Primary_State__c as [Primary State],
a.Primary_County__c as [Primary County],
a.Primary_Postal_Code__c as [Primary Postal Code],
a.Primary_Country__c as [Primary Country],
a.Region__c as [Region],
a.Sub_Region__c as [Sub-Region],
a.D_B_SIC_1__c as [D&B SIC Code 1],
a.D_B_SIC_Desc_1__c as [D&B SIC Code Description],
a.D_B_of_Employees__c as [D&B Employees],
s.Name as [SIC Code 1],
s.Description__c as [SIC Description],
a.DUNS__c as [DUNS],
a.DU_DUNS__c as [DUDUNS],
a.Parent_DUNS__c as [Parent/HQ DUNS],
a.GU_DUNS__c as [GUDUNS],
a.Maintenance_Status__c as [Maintenance Status],
a.Main_Industry__c as [Industry],
a.Sub_Industry__c as [Sub-Industry],

vrr.range_desc as [Revenue Range],
vre.range_desc as [Employee Range],
ip.Product_Line__c as [Product Line],
--prod.SKU__c as [SKU],
--ip.Product_Class__c as [SKU Class],
sum(ip.Maintenance_Revenue_USD__c) as [Maintenance Revenue USD], -- sum at product level maintenance
max(ip.Quantity_Licenses__c) as [# of Licenses],
max(a.d_b_annual_revenue__c / isnull(c2.conversionrate,1)) as Ann_Rev_USD

from dbo.vwAccount a
JOIN vwInstalled_Product_Line__c IP on a.id = ip.Account__c
and ip.Active_Flag__c in ('true')
join dbo.RecordType rt on a.RecordtypeID = rt.ID
and rt.name in ('Customer')
LEFT Join gdw.dbo.vwUser UO (Nolock) ON a.ownerid = uo.id
left join dbo.AccountTerritory at (nolock) on a.id = at.accountid
left join vwproduct2 prod (nolock) on prod.id = ip.product__c
LEFT JOIN dbo.CurrencyType C2 (NoLock) ON c2.IsoCode = a.CurrencyIsoCode

   left join vwinfor_sic_code__c s (nolock) 
       on s.id=a.Local_Industry_Lookup__c

--rev range
LEFT JOIN salesforce_prod.dbo.CurrencyType C (NoLock)
ON c.IsoCode = a.CurrencyIsoCode
LEFT JOIN gdw.dbo.valueranges vrr
ON vrr.range_code = 'revenue'
AND COALESCE(CASE
WHEN a.d_b_annual_revenue__c < 1 THEN NULL
ELSE a.d_b_annual_revenue__c / c.conversionrate
END, a.site_revenue__c / c.conversionrate , 0) BETWEEN vrr.low_range AND vrr.high_range
--emp range
left JOIN gdw.dbo.valueranges vre
ON vre.range_code = 'employee'
AND COALESCE(CASE
WHEN a.D_B_of_Employees__c < 1 THEN NULL
ELSE a.D_B_of_Employees__c
END, a.employee_count__c, 0) BETWEEN vre.low_range AND vre.high_range

Group by
a.SofTrax_ID__c,
a.Unique_Id__c,
uo.name,
uo.Role_Name__c,
uo.user_lob__c ,
a.Name,

rt.name,
a.Class__c,
a.Primary_Address_Line_1__c ,
a.Primary_City__c,
a.Primary_State__c ,
a.Primary_County__c,
a.Primary_Postal_Code__c ,
a.Primary_Country__c,
a.Region__c ,
a.Sub_Region__c,
a.D_B_SIC_1__c,
a.D_B_SIC_Desc_1__c ,
--[D&B Annual Revenue],
a.D_B_of_Employees__c,
s.Name ,
s.Description__c ,
a.DUNS__c ,
a.DU_DUNS__c ,
a.Parent_DUNS__c ,
a.GU_DUNS__c ,
a.Maintenance_Status__c ,
a.Main_Industry__c ,
a.Sub_Industry__c ,
ip.Product_Line__c,
vrr.range_desc ,
vre.range_desc) temp
order by 2


#2

One of your JOINs is connecting to multiple rows, for each of those ANY matching row in vwInstalled_Product_Line__c will be duplicated however many times there are multiple matching rows in the other table(s), and thus SUM will add them all up.

You could try this:

SELECT TOP 100 *
FROM dbo.vwAccount a
JOIN vwInstalled_Product_Line__c IP on a.id = ip.Account__c
...
left JOIN gdw.dbo.valueranges vre
    ON vre.range_code = 'employee'
    AND COALESCE(CASE
    WHEN a.D_B_of_Employees__c < 1 THEN NULL
    ELSE a.D_B_of_Employees__c
    END, a.employee_count__c, 0) BETWEEN vre.low_range AND vre.high_range

WHERE SomeKeyField = 'XXX'

run that JUST for the criteria for the single key that you know is wrong and see which tables are giving you duplicate rows. The row(s) from ALL table will be listed multiple times, so you are looking for the values from a table that are different, in some columns at least, on each row in the resultset

Please note that if this is a report for End Users, rather than DBAs, you almost certainly should not include NOLOCK hints. That WILL, sooner order later, cause your report to exclude some rows altogether or include some rows twice, and that effect will be unpredictable, rare and totally unrepeatable, and I imagine would be an issue if users are making business decisions based on the data.

Personally I would have written this query differently. I would gather the KEY fields first, and perhaps the SUM/aggregate value(s), and then join that back to the table(s) to get the detail columns. That would avoid having to have a massive GROUP BY list (which I expect makes the job hard for SQL) and would either avoid the DUPs you are getting or, I hope!!, make it easier to manage them "out" of the process.

It would be appreciated if you could format any code here please - highlight the code block and press the [</>] button, thanks :smile:


#4

I have tried fixing my code, but it is still not working.

select distinct 
a.SofTrax_ID__c as [Softrax ID],
a.Unique_Id__c as [Salesforce Account ID],
uo.name [Owner],
uo.Role_Name__c as [Owner Role],
uo.user_lob__c as [Account Owner LOB],
a.Name as [Account Name],
rt.name [Account Record Type],
a.Class__c as [Class],
a.Primary_Address_Line_1__c as [Primary Address Line 1],
a.Primary_City__c as [Primary City],
a.Primary_State__c as [Primary State],
a.Primary_County__c as [Primary County],
a.Primary_Postal_Code__c as [Primary Postal Code],
a.Primary_Country__c as [Primary Country],
a.Region__c as [Region],
a.Sub_Region__c as [Sub-Region],
a.D_B_SIC_1__c as [D&B SIC Code 1],
a.D_B_SIC_Desc_1__c as [D&B SIC Code Description],
(a.d_b_annual_revenue__c / isnull(conv.conversionrate,1)) as [D&B Annual Revenue],
a.D_B_of_Employees__c as [D&B Employees],
s.Name as [Infor SIC Code 1],
s.Description__c as [Infor SIC Description],
a.DUNS__c as [DUNS],
a.DU_DUNS__c as [DUDUNS],
a.Parent_DUNS__c as [Parent/HQ DUNS],
a.GU_DUNS__c as [GUDUNS],
a.Maintenance_Status__c as [Maintenance Status],
a.Main_Industry__c as [Industry],
a.Sub_Industry__c as [Sub-Industry],
vrr.range_desc as [Revenue Range], 
vre.range_desc as [Employee Range],
ipm.Product_Line__c as [Product Line],
isnull(ipm.Maint_USD,0) Maintenance_USD,
isnull(ipm.Max_License,0) as [# of License]

from dbo.vwAccount a

left JOIN vwInstalled_Product_Line__c Install_P on a.id = Install_P.Account__c

-- Creates SUM Of Maintenance at Product Line level for Active Products
left join (
		Select
			ip.Account__c,
			ip.Product_Line__c,
			ip.Active_Flag__c,

			isnull (sum(ip.Maintenance_Revenue_USD__c),0) as Maint_USD,
			isnull (max(ip.Quantity_Licenses__c),0) as Max_License
			from vwInstalled_Product_Line__c IP
						left join ( 
									select distinct
											prod.id,
											prod.SKU__c,
											prod.Product_Class__c
														from vwproduct2 prod
															group by
																prod.id,
																prod.SKU__c,
																prod.Product_Class__c) sku_prod -- now uses SKU_PROD for querying SKU level for Cloud and Hosting
																on sku_prod.id = ip.product__c
			where ip.Active_Flag__c in ('true')
			group by
			ip.Product_Line__c,
			ip.Account__c,
			ip.Active_Flag__c
) ipm
		ON a.id = ipm.Account__c

-- Connects for the currency conversion for DnB Rev
left join (
		Select
			c2.IsoCode,
			c2.conversionrate
		from salesforce_prod.dbo.CurrencyType c2) conv
		on conv.IsoCode = a.CurrencyIsoCode

-- Connects Products to SKU level Join
--left join vwproduct2 prod (nolock) on prod.id = Install_P.product__c

/*left join ( 
		select distinct
		prod.id,
		prod.SKU__c,
		prod.Product_Class__c
		from vwproduct2 prod
		group by
	    prod.id,
		prod.SKU__c,
		prod.Product_Class__c) sku_prod -- now uses SKU_PROD for querying SKU level for Cloud and Hosting
		on sku_prod.id = Install_P.product__c*/


-- Account Type Join
left JOIN salesforce_prod.dbo.RecordType rt
		on a.RecordtypeID = rt.ID	

-- Account Owner Join
LEFT Join gdw.dbo.vwUser UO 
		ON  a.ownerid = uo.id

-- INFOR SIC Code Join
left join vwinfor_sic_code__c s
        on s.id=a.Local_Industry_Lookup__c

-- rev range
       LEFT JOIN salesforce_prod.dbo.CurrencyType C (NoLock)
         ON c.IsoCode = a.CurrencyIsoCode
       LEFT JOIN gdw.dbo.valueranges vrr
         ON vrr.range_code = 'revenue'
            AND COALESCE(CASE
                           WHEN a.d_b_annual_revenue__c < 1 THEN NULL
                           ELSE a.d_b_annual_revenue__c / c.conversionrate
                         END,  a.site_revenue__c / c.conversionrate , 0) BETWEEN vrr.low_range AND vrr.high_range
-- emp range
       left JOIN gdw.dbo.valueranges vre
         ON vre.range_code = 'employee'
            AND COALESCE(CASE
                           WHEN a.D_B_of_Employees__c < 1 THEN NULL
                           ELSE a.D_B_of_Employees__c
                         END, a.employee_count__c, 0) BETWEEN vre.low_range AND vre.high_range

			
where 
rt.name in ('Customer')

and
sku_prod.Product_class__c in ('hs')

order by 2

#5

Which table did my test show you was showing multiple rows, for a "single record" WHERE clause?


#6

the vwInstalled_Product_Line__c table i think


#7

You'd need to be sure :slight_smile: but you could then try replacing the

JOIN vwInstalled_Product_Line__c IP on a.id = ip.Account__c
and ip.Active_Flag__c in ('true')
join dbo.RecordType rt on a.RecordtypeID = rt.ID
and rt.name in ('Customer')

with

OUTER APPLY
(
    SELECT TOP 1 Col1, Col2, ...
    FROM vwInstalled_Product_Line__c IP 
    WHERE a.id = ip.Account__c
    and ip.Active_Flag__c in ('true')
    ORDER BY SortCol1, SortCol2, ...
) AS IP

Your "Col1, Col2, ..." list needs to be all the columns used in the query (which is probably just everything with the "ip." alias prefix) and the "SortCol1, SortCol2, ..." list to make sure that whichever record is sorted first will be the most appropriate record to use. I would recommend that you include some unique column(s) in that list - e.g. the Primary Key columns for vwInstalled_Product_Line__c - so that the ORDER BY is guaranteed repeatable so you get the exact same record selected every time you re-run the query.


#8

Just noticed that you have a SUM() on some IP. columns, and there is an IP. column used in another join, you might need to handle that differently ... but perhaps just see if it looks like replacing the that JOIN with my OUTER APPLY will give you a single row, instead of multiple rows, back in my original "Find the duplicates" query