SQLTeam.com | Weblogs | Forums

Sum/Count w/o duplicates


#1

Hello.

I have been spoiled with the ease of Access, but have outgrown it and am using SQL Server Management Studio.

So, please excuse my ignorance on the topic of ignoring duplicates when counting or summing.

I have figured out how to eliminate duplicates by using Select Distinct for a detailed query. However, I am struggling when trying to add or count (the duplicates are being included in the totals).

Here is an abbreviated version of what I have thus far. It provides me the results grouped by the desired columns, but it is including the duplicates. What am I missing to eliminate duplicates in the totals? Thank you!

Select

Merchant Category = Case When cct.MerchantSICCode='5542' Then 'Automated Fuel Dispensers'
Else 'No MCC Assigned'
End,

ProductName=Case When cct.PricingStrategy IN ('XRGA','XRGB','XRGC') Then 'Signature'
Else 'Other Product'
End,

SalesTransactionCount=SUM(CASE When cct.TransactionCode='253' Then 1 Else 0 End),
SalesTransactionDollar=Sum(CASE When cct.TransactionCode='253' Then cct.TransactionAmount Else 0 End),
InterchangeGross=Sum(CASE When cct.TransactionCode='253' Then cct.Interchange Else 0 End)-1,
SalesReturnCount=SUM(CASE When cct.TransactionCode='255' Then 1 Else 0 End),
SalesReturnDollar=Sum(CASE When cct.TransactionCode='255' Then cct.TransactionAmount Else 0 End)
-1,
InterchangeReverse=Sum(CASE When cct.TransactionCode='255' Then cct.Interchange Else 0 End)*-1

From
TransactionTable cct

Where
cct.TransactionDate Between '2015-01-01' AND '2016-12-31'
AND cct.MerchantSICCode Not In ('11','12','0','3','4','7')

Group By

Case When cct.MerchantSICCode='5542' Then 'Automated Fuel Dispensers'
Else 'No MCC Assigned'
End,

Case When cct.PricingStrategy IN ('XRGA','XRGB','XRGC') Then 'Signature'
Else 'Other Product'
End


#2

Please show us sample data and expected output.


#3

I don't see how you'd get duplicate SUMs with no table joins.


#4

I don't see anything in the query that would cause duplicates to be created - so that must mean there are duplicates in the data. Before you can begin to separate out duplicates you have to identify why they are duplicates - then you will have a better idea on how to eliminate them.

My guess - without any supporting information - is that there are transaction codes included in the results that you don't want to count.

If there are actually duplicates in the TransactionTable then you really need to identify why the application is even allowing those duplicates to be added in the first place.


#5

Thanks for all of the responses. Sorry this is just one table and there are duplicate rows within the table.

This is historical data and no duplicates and currently no duplicates are being introduced.


#6

I would store the "category" value of those two columns in a separate, lookup, table (and OUTER JOIN to that) so that as/when new analysis codes are needed they can be added to the Lookup Table, rather than having to change the source code (which will be an error-prone process as there will be multiple places in the code that need to be changed, and some of them will probably be overlooked - and, importantly, they will fail UNsafe to 'No MCC Assigned' / 'Other Product' and mislead the report's user.

it also makes your GROUP BY a bit easier as you won't have to maintain duplicate code there (if that gets out of Sync then SQL will give you an error, so its not a code-quality related issue, more of a convenience).

One option would be to create a VIEW that encapsulated this:

Of course I don't know your data / APP, so this may not be relevant.

CREATE VIEW TransactionTable_View
AS
SELECT MerchantSICCode,
       PricingStrategy,
       ... all other columns ...
       [V_MerchantCategory] = Case When cct.MerchantSICCode='5542'
                              Then 'Automated Fuel Dispensers'
                              Else 'No MCC Assigned'
                              End,

       [V_ProductName] = Case When cct.PricingStrategy IN ('XRGA','XRGB','XRGC')
                         Then 'Signature'
                         Else 'Other Product'
                         End,
       [V_IsTransaction] = CASE When cct.TransactionCode='253' Then 1 Else 0 End,
       ... any other "calculated" columns ...

one benefit of this approach is that if you decide to add OUTER JOIN to lookup table(s) in the future then none of the associated code needs to change, it just uses FROM TransactionTable_View instead of FROM TransactionTable and doesn't care how the data is retrieved.

If I've understood your duplicates problem correctly you might be able to replace:

From TransactionTable cct

with

From
(
    SELECT DISTINCT MerchantSICCode, PricingStrategy,
                    ... list of all columns required to ensure unique ...
    FROM TransactionTable
) AS cct

but personally I would fix the underlying data (delete the duplicates, or add a "flag" column [NotRequiredInReports] and exclude those in reports. The DISTINCT may very well remove a transaction that is a duplicate, but just happens to have the same Value or whatever.


#7

Okay - there are duplicates in the table, so how do you know they are duplicates? What is causing those rows to be duplicate to other rows?