 # COUNT DISTINCT and GROUP BY question

Hi,

I have what is probably a simple question.

I'm working with credit card transaction data and each transaction has a fraud field that is populated with an 'N' or 'Y' depending on whether or not the transaction has been identified as fraud. Each record also has the account number.

I'm needing the number of accounts that have at least one transaction marked as 'Y' to get the total number of accounts that have had a fraudulent transactions.

I'm actually having to deal with this particular data in SAS as opposed to SQL server but it's a simple question and simple code so I figure you guys can assist.

I can write:

proc sql;
select
fraud_flag,
count(distinct account_number)
from transaction_table
group by fraud_flag;
quit;

My questions pertains to how each distinct account number is assigned to a group in the group by. An account can have transactions with a fraud flag that is 'Y' and transactions with a fraud flag that is 'N'. Obviously distinct account numbers with all 'Y' transactions or all 'N' transactions would be counted in 'Y' or 'N' respectively. But what about accounts that have a mix of transactions. What determines if that distinct account is counted in the 'Y' or 'N' group?

Hopefully my question makes sense. If not, I'm happy to clarify further.

Thanks in advance for your assistance Sorry, misundrstood the question.
Try this:

``````select count(*)
from (select account_number
from transaction_table
where fraud_flag='Y'
group by account_number
) as a
;
``````

or

``````select count(distinct account_number)
from transaction_table
where fraud_flag='Y'
;
``````

I tried to recreate your idea and I think using case inside the count you can get the result that you want

create table ##trans
(
account int,
flag nvarchar(2)
)

insert into ##trans
values

(1,'N'),
(2,'Y'),
(3,'Y'),
(1,'Y'),
(4,'Y'),
(4,'N')

SELECT account, COUNT(CASE when flag= 'Y' then '1' else null end)as countYes, COUNT(CASE when flag= 'N' then '1' else null end)as countNo FROM ##trans
GROUP BY account

It will be counted in both YES and NO totals. If you are wanting the number of accounts that only have Fraud, only do NOT have Fraud, and those that have a mixture of both then this perhaps:

``````SELECT SUM(CASE WHEN YesCount >= 1 AND NoCount = 0 THEN 1 ELSE 0 END) AS JustYes,
SUM(CASE WHEN YesCount = 0 AND NoCount >= 1 THEN 1 ELSE 0 END) AS JustNo,
SUM(CASE WHEN YesCount >= 1 AND NoCount >= 1 THEN 1 ELSE 0 END) AS BothYesNo,
SUM(TotalCount) AS GrandTotal
FROM
(
select account_number,
COUNT(*) AS TotalCount,
SUM(CASE WHEN fraud_flag='Y' THEN 1 ELSE 0 END)  AS YesCount,
SUM(CASE WHEN fraud_flag='N' THEN 1 ELSE 0 END)  AS NoCount
from transaction_table
group by account_number
) AS X
``````
1 Like

Thank you all for the help. A big thank you to Kristen for clarifying that it would be counted in both. If you want to see only account_numbers with fraud:

``````select account_number
from transaction_table
group by account_number
having max(case when fraud_flag = 'Y' then 1 else 0 end) = 1
--[or]: having sum(case when fraud_flag = 'Y' then 1 else 0 end) > 0``````