How to count # of IDs that have reached each step in funnel in DataGrip or AWS QuickSigh?

zw_wh91 · May 13, 2022, 9:20pm

Hello,

I've been working on this question for several days and can't seem to get the answer, even though it seems straight forward.

I have a table in this format, which is essentially a funnel (the numbers indicate whether they got to that step):

ID	step.a	step.b	step.c	step.d
5467894	a	b	c	0
2465763	0	0	0	0
3438431	a	b	c	d
4574321	a	b	0	0
5498732	0	0	0	0
4543354	a	b	c	0
6543245	a	0	0	0
4651349	a	0	0	0

I then have a statement that says GREATEST(step.a,step.b,step.c,step.d), that gives me the highest step for each ID. This is what that output looks like:

ID	greatest
5467894	c
2465763	0
3438431	d
4574321	b
5498732	0
4543354	c
6543245	a
4651349	a

So the highest step ID 5467894 gets to is step.c, 3438431 gets to step.d, 4574321 gets to step.b, etc.

What I need to show is the # of IDs that got to each step, because if an ID got to step.c, then they got to step.a and step.b as well. If an ID got to step.d, then they went through step.a, step.b, and step.c.

Looking at my 'greatest' column above, I should have the following:

step	# of IDs
step.a	6
step.b	4
step.c	2
step.d	1

For my end result, I just need to show how many IDs are in each step. I am using DataGrip and QuickSight, and I can use either to calculate the column. I don't need to use the GREATEST() command, just an idea I had. Any help would be greatly appreciated. Thank you so much.

jeffw8713 · May 14, 2022, 2:20pm

This is a forum for Microsoft SQL Server - so any solutions may not work for your flavor of SQL. Would this work:

SELECT stepa = sum(case when step.a > 0 then 1 end)
     , stepb = sum(case when step.b > 0 then 1 end)
     , stepc = sum(case when step.c > 0 then 1 end)
     , stepd = sum(case when step.d > 0 then 1 end)
  FROM yourOriginalTable

zw_wh91 · May 16, 2022, 8:48pm

jeffw8713:

SELECT stepa = sum(case when step.a > 0 then 1 end)
     , stepb = sum(case when step.b > 0 then 1 end)
     , stepc = sum(case when step.c > 0 then 1 end)
     , stepd = sum(case when step.d > 0 then 1 end)
  FROM yourOriginalTable

Hi Jeff,

Thank you so much for your response -- that is sort of where I need to get to, only I need it in the opposite format unfortunately. I need the sums showing vertically instead of horizontally - so with your statement, I get the column titles as 'stepa', 'stepb', 'stepc' etc. with only one row with the sum of each step:

step.a	step.b	step.c	step.d
6	4	2	1

I was hoping to be able to get a column for the step and a column for the # of IDs, with each sum as a row (4 rows total with step.a, step.b, step.c, step.d):

It is incredibly annoying as AWS QuickSight is unable to accept any other format for a funnel visual. Do you know if this is at all possible?

jeffw8713 · May 16, 2022, 9:34pm

I have no idea what is available - but if you can unpivot that is really what is needed here.

If doing this in T-SQL I would use CROSS APPLY to unpivot:

Declare @sampleTable table (ID int, [step.a] char(1), [step.b] char(1), [step.c] char(1), [step.d] char(1)) 
 Insert Into @sampleTable (ID, [step.a], [step.b], [step.c], [step.d])
 Values (5467894, 'a', 'b', 'c', '0')
      , (2465763, '0', '0', '0', '0')
      , (3438431, 'a', 'b', 'c', 'd')
      , (4574321, 'a', 'b', '0', '0')
      , (5498732, '0', '0', '0', '0')
      , (4543354, 'a', 'b', 'c', '0')
      , (6543245, 'a', '0', '0', '0')
      , (4651349, 'a', '0', '0', '0');

 Select step_name
      , total_count = count(*)
   From @sampleTable st
  Cross Apply (Values ('step.a', st.[step.a])
                    , ('step.b', st.[step.b])
                    , ('step.c', st.[step.c])
                    , ('step.d', st.[step.d])) As t(step_name, step_value)
  Where t.step_value <> '0'
  Group By
        t.step_name;

Then it is just a matter of eliminating the rows = 0 and then counting. If you do have an unpivot operator - then you can do essentially the same thing as the CROSS APPLY - and again, it is just a matter of grouping and totaling at that point.

zw_wh91 · May 19, 2022, 6:38pm

Hi Jeff,

Thank you so much for you response -- I am using PostgreSQL, and I have tried to translate the above T-SQL but am having some trouble as the statements available are not all 1-to-1. I've done some research and it looks like UNION ALL is the equivalent to an unpivot, I'll keep digging in there.
I really appreciate your help, many thanks!

zw_wh91 · May 19, 2022, 7:27pm

Hi Jeff,

Wanted to send this along and thank you again for your assistance -- I looked into the unpivot (UNION ALL in PostgreSQL) and the solution was actually very simple:

select
count(distinct ID) as count_IDs,
'step.a' step

from table t

where step.a > 0

UNION ALL

select
count(distinct ID) as count_IDs,
'step.b' step

from table t

where step.b > 0

UNION ALL

select
count(distinct ID) as count_IDs,
'step.c' step

from table t

where step.c > 0

UNION ALL

select
count(distinct ID) as count_IDs,
'step.d' step

from table t

where step.d > 0

It was a great help that you suggested an unpivot, I really appreciate it. I am used to very simple SQL Server statements (i.e. select data from table), but as part of a new team I am now learning to code via DataGrip (using PostgreSQL) and visualize data in AWS QuickSight. Thank you!

jeffw8713 · May 30, 2022, 2:04pm

Sorry for the late response - was on vacation. I did a quick search and PostgreSQL has a LATERAL JOIN that is the equivalent of a CROSS/OUTER APPLY in T-SQL.

Looks like you were able to figure out a solution - but do check out LATERAL JOIN as it may be a simpler query than using multiple UNIONS.