Producing a count of distinct values from a column with comma separated values

Hello,

Problem Statement:
I am trying create an query (Redshift) that will count the number of times each value appears in a column that contains comma separated values.

Details:

  • The possible values in the CSV column are predefined.
  • There are 14 possible values, I know these without needing to refer to the data
  • Each of the 14 values can only appear in the CSV column at most once
  • Each record has at least one of the 14 values in the CSV column, but could contain many (up to 14).

I tried adding tally/counter columns to my view that would record: 1 (CSV column has value) or 0 (CSV column doesn't have value)

case when [column] like '%value1%' then 1 else 0 as value_1,
case when [column] like '%value2%' then 1 else 0 as value_2,
etc.

This worked in the sense that now I have columns that indicate how many true or false whether each value is in the CSV column, however I'm struggling to summarize these tally columns in a way that gives me a count for each distinct value. Any ideas? (or maybe I'm going about this all wrong?)

as an example, the column below is represents my CSV column for a veterinarian table. the column 'pets' represent the type of pet each vet clinic works with.

pets
cat, fish, dog
cat, dog
dog
dog, bird, fish
fish, rat, cat

I need to produce a count of vet clinics that will see each pet type.

Desired output
bird | 1
cat | 3
dog | 4
fish | 3
rat | 1

Thank you in advance.

Welcome. This is a forum for Microsoft SQL Server. That said maybe you should look at https://docs.aws.amazon.com/redshift/latest/dg/SPLIT_PART.html

in sql, one of the many possible ways

image

if OBJECT_ID('tempdb..#pets') is not null
	drop table #pets

create table #pets(mishmash nvarchar(150))

insert into #pets
select 'cat, fish, dog' union
select 'cat, dog' union
select 'dog' union
select 'dog, bird, fish' union
select 'fish, rat, cat' 

select rtrim(ltrim(mm.value)), count(1) 
 from #pets p
 cross apply string_split(mishmash, ',') mm
 group by rtrim(ltrim(mm.value))

Thank you for the suggestion and for the clarification. I wasn't able to get the cross apply / string split to work with redshift, but I was able to make progress using a different strategy.

I am now able to get a sum for each pet type however the sum values each appear in their own column (i need them to have them transposed -- columns <-> rows). I tried using the UNPIVOT function but it doesn't seem to be compatible with Redshift. Any other ideas? (I will post the question on a Redshift specific forum if not, just wanted to check in case there was something obvious I could do here)

Note that categories are static, these 14 will never change or have new values added.

this is my query so far:

    SELECT 
    count(case when vet.pets like '%dog%' THEN 1 ELSE NULL END) AS dog
    ,  count(case when vet.pets like '%cat%' THEN 1 ELSE NULL END) as cat
    ,  count(case when vet.pets like '%rat%' THEN 1 ELSE NULL END) as rat
    ,  count(case when vet.pets like '%bird%' THEN 1 ELSE NULL END) as bird
    ,  count(case when vet.pets like '%fish%' THEN 1 ELSE NULL END) as fish
    ,  count(case when vet.pets like '%hamster%' THEN 1 ELSE NULL END) as hamster
    ,  count(case when vet.pets like '%ferret%' THEN 1 ELSE NULL END) as ferret
    ,  count(case when vet.pets like '%horse%' THEN 1 ELSE NULL END) as horse
    ,  count(case when vet.pets like '%zebra%' THEN 1 ELSE NULL END) as zebra
    ,  count(case when vet.pets like '%bear%' THEN 1 ELSE NULL END) as bear
    ,  count(case when vet.pets like '%alligator%' THEN 1 ELSE NULL END) as alligator
    ,  count(case when vet.pets like '%sugar_glider%' THEN 1 ELSE NULL END) as sugar_glider
    ,  count(case when vet.pets like '%tiger%' THEN 1 ELSE NULL END)  as tiger
    ,  count(case when vet.pets like '%rock%' THEN 1 ELSE NULL END) as rock
    FROM vet

This produces:

    dog | cat | rat | bird | fish | hamster | ferret | horse | zebra | bear | alligator | sugar_glider | tiger | rock  
    3 | 5 | 2 | 6 | 8 | 1 | 8 | 5 | 1 | 3 | 2 | 2 | 4 | 2 

I need it in the format:

    dog | 3
    cat | 5
    rat | 2
    bird | 6
    fish | 8
    hamster | 1
    ferret | 8
    horse | 5
    zebra | 1
    bear | 3
    alligator | 2
    sugar_glider | 2
    tiger | 4
    rock | 2

if tomorrow there is a new animal discovered your approach will not work, it is not dynamic. data is not static. I think you should post in a redshift forum

I understand. The actual query does not have to do with Animals. This is just a sample. In this case the data is static and there will never be new values added to it. I will post to a redshift forum for additional guidance.

Thank you!

@yosiasz

AWS RedShift uses a subset of Postgre behind the scenes. Of course, it's missing a lot of the big pieces because RedShift is designed for an entirely different purpose than what most people are accustomed to. This is NOT a normal RDBMS Engine.

https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html

2 Likes

hi

you can do it like .. this

SELECT 
    count(case when vet.pets like '%dog%' THEN 1 ELSE NULL END) AS dog
    FROM vet
union all
select   count(case when vet.pets like '%cat%' THEN 1 ELSE NULL END) as cat
    FROM vet
union all
select   count(case when vet.pets like '%rat%' THEN 1 ELSE NULL END) as rat
from vet 

union all
......
....
....

1 Like

yes! That did it. Thank you Harishgg1. just a slight modification: i added a column to call out which animal instance was being counted:

SELECT 
    'dog' AS animal, 
    count(case when vet.pets like '%dog%' THEN 1 ELSE NULL END) AS count
    FROM vet
Union all
SELECT etc....

Thank you for the suggestion! :slight_smile: