Hi
As an assignment, I had to write 3 queries that answer 3 different business questions with an emphasis on optimization (Tables consist billions of rows). I'm not working with any actual DB so ignore lack of information about the actual DBMS, indexing, etc
Query 1:
Find the top 3 products(product name) by the highest number of clients purchasing them
> select p.product_name, e.purchases From ( > Select top 3 > e.product_id , count(distinct e.client_id) as purchases > From events as e > Where event_type = 3 > Order by 2 DESC ) as e left join products as p on e.product_id=p.product_id
Query 2:
find the amount of clients which have seen an impression of a product before purchasing it.
> ;WITH cteProdsClients AS (
> SELECT product_id, client_id,
> MIN(CASE WHEN event_type = '1' THEN event_ts ELSE NULL END) AS first_impression,
> MIN(CASE WHEN event_type = '3' THEN event_ts ELSE NULL END) AS first_purchase
> FROM events
> WHERE event_type IN ('1', '3')
> GROUP BY product_id, client_id
> HAVING MAX(CASE WHEN event_type = '1' THEN 1 ELSE 0 END) = 1 AND
> MAX(CASE WHEN event_type = '3' THEN 1 ELSE 0 END) = 1 )
> SELECT COUNT(DISTINCT client_id) AS client_count FROM cteProdsClients WHERE
> > first_impression < first_purchase
Query 3:
clients which have seen an impression of a product at least 3 times before purchasing it.
> ;WITH cteProdsClients AS (
> SELECT product_id, client_id,
> MIN(CASE WHEN event_type = '1' THEN event_ts ELSE NULL END) AS
> first_impression,
> MIN(CASE WHEN event_type = '3' THEN event_ts ELSE NULL END) AS first_purchase
> FROM events
> WHERE event_type IN ('1', '3')
> GROUP BY product_id, client_id
> HAVING MIN(event_type) = '1' AND MAX(event_type) = '3' AND SUM(CASE WHEN event_type =
'1' THEN 1 END)>=3
> ) SELECT DISTINCT client_id FROM cteProdsClients WHERE
> first_impression < first_purchase
I added the filter index for the 3rd query (I guess it helps for the second as well):
> CREATE UNIQUE INDEX index_name ON events( event_type, product_id,
> client_id, event_ts ) WHERE event_type IN ('1', '3')
My question is- are there other indexes I can add to improve performance? or is this one enough?
I'm open for advices for any other improvements in any query.
Thanks