Producing a count of distinct values from a column with comma separated values

bruchar · May 17, 2020, 5:00am

Hello,

Problem Statement:
I am trying create an query (Redshift) that will count the number of times each value appears in a column that contains comma separated values.

Details:

The possible values in the CSV column are predefined.
There are 14 possible values, I know these without needing to refer to the data
Each of the 14 values can only appear in the CSV column at most once
Each record has at least one of the 14 values in the CSV column, but could contain many (up to 14).

I tried adding tally/counter columns to my view that would record: 1 (CSV column has value) or 0 (CSV column doesn't have value)

case when [column] like '%value1%' then 1 else 0 as value_1,
case when [column] like '%value2%' then 1 else 0 as value_2,
etc.

This worked in the sense that now I have columns that indicate how many true or false whether each value is in the CSV column, however I'm struggling to summarize these tally columns in a way that gives me a count for each distinct value. Any ideas? (or maybe I'm going about this all wrong?)

as an example, the column below is represents my CSV column for a veterinarian table. the column 'pets' represent the type of pet each vet clinic works with.

pets
cat, fish, dog
cat, dog
dog
dog, bird, fish
fish, rat, cat

I need to produce a count of vet clinics that will see each pet type.

Desired output
bird | 1
cat | 3
dog | 4
fish | 3
rat | 1

Thank you in advance.

yosiasz · May 17, 2020, 5:21am

Welcome. This is a forum for Microsoft SQL Server. That said maybe you should look at https://docs.aws.amazon.com/redshift/latest/dg/SPLIT_PART.html

in sql, one of the many possible ways

if OBJECT_ID('tempdb..#pets') is not null
	drop table #pets

create table #pets(mishmash nvarchar(150))

insert into #pets
select 'cat, fish, dog' union
select 'cat, dog' union
select 'dog' union
select 'dog, bird, fish' union
select 'fish, rat, cat' 

select rtrim(ltrim(mm.value)), count(1) 
 from #pets p
 cross apply string_split(mishmash, ',') mm
 group by rtrim(ltrim(mm.value))

bruchar · May 17, 2020, 6:22pm

Thank you for the suggestion and for the clarification. I wasn't able to get the cross apply / string split to work with redshift, but I was able to make progress using a different strategy.

I am now able to get a sum for each pet type however the sum values each appear in their own column (i need them to have them transposed -- columns <-> rows). I tried using the UNPIVOT function but it doesn't seem to be compatible with Redshift. Any other ideas? (I will post the question on a Redshift specific forum if not, just wanted to check in case there was something obvious I could do here)

Note that categories are static, these 14 will never change or have new values added.

this is my query so far:

    SELECT 
    count(case when vet.pets like '%dog%' THEN 1 ELSE NULL END) AS dog
    ,  count(case when vet.pets like '%cat%' THEN 1 ELSE NULL END) as cat
    ,  count(case when vet.pets like '%rat%' THEN 1 ELSE NULL END) as rat
    ,  count(case when vet.pets like '%bird%' THEN 1 ELSE NULL END) as bird
    ,  count(case when vet.pets like '%fish%' THEN 1 ELSE NULL END) as fish
    ,  count(case when vet.pets like '%hamster%' THEN 1 ELSE NULL END) as hamster
    ,  count(case when vet.pets like '%ferret%' THEN 1 ELSE NULL END) as ferret
    ,  count(case when vet.pets like '%horse%' THEN 1 ELSE NULL END) as horse
    ,  count(case when vet.pets like '%zebra%' THEN 1 ELSE NULL END) as zebra
    ,  count(case when vet.pets like '%bear%' THEN 1 ELSE NULL END) as bear
    ,  count(case when vet.pets like '%alligator%' THEN 1 ELSE NULL END) as alligator
    ,  count(case when vet.pets like '%sugar_glider%' THEN 1 ELSE NULL END) as sugar_glider
    ,  count(case when vet.pets like '%tiger%' THEN 1 ELSE NULL END)  as tiger
    ,  count(case when vet.pets like '%rock%' THEN 1 ELSE NULL END) as rock
    FROM vet

This produces:

    dog | cat | rat | bird | fish | hamster | ferret | horse | zebra | bear | alligator | sugar_glider | tiger | rock  
    3 | 5 | 2 | 6 | 8 | 1 | 8 | 5 | 1 | 3 | 2 | 2 | 4 | 2

I need it in the format:

    dog | 3
    cat | 5
    rat | 2
    bird | 6
    fish | 8
    hamster | 1
    ferret | 8
    horse | 5
    zebra | 1
    bear | 3
    alligator | 2
    sugar_glider | 2
    tiger | 4
    rock | 2

yosiasz · May 17, 2020, 7:12pm

if tomorrow there is a new animal discovered your approach will not work, it is not dynamic. data is not static. I think you should post in a redshift forum

bruchar · May 17, 2020, 7:28pm

I understand. The actual query does not have to do with Animals. This is just a sample. In this case the data is static and there will never be new values added to it. I will post to a redshift forum for additional guidance.

Thank you!

JeffModen · May 17, 2020, 8:39pm

@yosiasz

AWS RedShift uses a subset of Postgre behind the scenes. Of course, it's missing a lot of the big pieces because RedShift is designed for an entirely different purpose than what most people are accustomed to. This is NOT a normal RDBMS Engine.

https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html

harishgg1 · May 18, 2020, 8:38am

hi

you can do it like .. this

SELECT 
    count(case when vet.pets like '%dog%' THEN 1 ELSE NULL END) AS dog
    FROM vet
union all
select   count(case when vet.pets like '%cat%' THEN 1 ELSE NULL END) as cat
    FROM vet
union all
select   count(case when vet.pets like '%rat%' THEN 1 ELSE NULL END) as rat
from vet

union all
......
....
....

bruchar · May 18, 2020, 3:35pm

yes! That did it. Thank you Harishgg1. just a slight modification: i added a column to call out which animal instance was being counted:

SELECT 
    'dog' AS animal, 
    count(case when vet.pets like '%dog%' THEN 1 ELSE NULL END) AS count
    FROM vet
Union all
SELECT etc....

Thank you for the suggestion!