Concurrency of items

GilbertoH · December 13, 2018, 10:02pm

Hello,

I have a table like:

Invoice Item
123 sugar
123 rice
124 sugar
125 pampers
125 castup

basic one, where an invoice can have more than one item.

Now I need to make some kind of pivot where I have

              Sugar          rice                pampers       catsup

sugar 2 1 0 0
rice 0 1 0 0
pampers 0 0 1 1
catsup 0 0 1 0

I can so it with a lot of CASE for each Sugar, rice, etc, but that is too expensive on query for big database, so I assume it have no be a better way.

yosiasz · December 13, 2018, 10:29pm

--sample
create table #kampalalo(Invoice int, Item varchar(50))
insert into #kampalalo 
select '123', 'sugar' union 
select '123', 'rice' union 
select '124', 'sugar' union 
select '125', 'pampers' union 
select '125', 'castup'  

--main query
select sugar, rice, pampers, castup
from
(
  select   Item
  from #kampalalo
) d
pivot
(
  count(Item)
  for Item in (sugar, rice, pampers, castup)
) piv;

drop table #kampalalo

GilbertoH · December 13, 2018, 10:36pm

It does not provide mix of both, only one line is shown:

|sugar|rice|pampers|castup|
|2|1|1|1|

mannesravya · December 14, 2018, 12:06am

could you please let me know based on what you got those binary numbers

mannesravya · December 14, 2018, 12:08am

--drop table kampalalo
--create table kampalalo(Invoice int, Item varchar(50))
--insert into kampalalo
--select '123', 'sugar' union
--select '123', 'rice' union
--select '124', 'sugar' union
--select '125', 'pampers' union
--select '125', 'castup'

--select Item,[sugar],[rice],[pampers],[castup]
--from #kampalalo
--pivot
--(
-- count(Item)
-- for Item in ([sugar], [rice], [pampers], [castup])
--) as p

SELECT [Invoice], [sugar], [rice], [pampers], [castup]
FROM kampalalo
PIVOT
(
count([Item])
FOR [Item] IN ([sugar], [rice], [pampers], [castup])
) AS P

GilbertoH · December 14, 2018, 3:45pm

Thank you Mannesravya,

but I believe my question is not correct.

The requirement is to know how many invoices with sugar are sold with rice, how many with pampers, how many with catsup, etc.

The visualization will be a table where the same items are as columns and rows, so on the middle (results), the count of invoces that have the column and the row items.

so if the the colums is rice and the row is pampers, the results is the amount of invoices that have those two items together.

Yes, the table will have a duplicate result, because, if the row is pampers and the column is sugar, it must have the same result as row sugar and column pampers.

ScottPletcher · December 14, 2018, 4:33pm

I don't understand your given answer:

sugar 2 1 0 0
rice 0 1 0 0
pampers 0 0 1 1
catsup 0 0 1 0

Why isn't the first line "sugar 1 1" or the second line "rice 1 1"? You count the rice selling with the sugar, and the sugar alone. But then on the rice line, you don't count the rice that was sold with sugar. I'm confused.

The query below shows all combinations that were sold together. Yes, the totals overlap, but I don't see any other way to do this.

SELECT 
    item,
    SUM(sugar) AS sugar,
    SUM(rice) AS rice,
    SUM(pampers) AS pampers,
    SUM(catsup) AS catsup
FROM (
    SELECT Invoice,
        SUM(CASE WHEN Item = 'sugar' THEN 1 ELSE 0 END) AS sugar,
        SUM(CASE WHEN Item = 'rice' THEN 1 ELSE 0 END) AS rice,
        SUM(CASE WHEN Item = 'pampers' THEN 1 ELSE 0 END) AS pampers,
        SUM(CASE WHEN Item = 'catsup' THEN 1 ELSE 0 END) AS catsup
    FROM #kampalalo 
    GROUP BY Invoice
    --ORDER BY Invoice
) AS query1
CROSS JOIN ( VALUES('sugar'), ('rice'), ('pampers'), ('catsup') ) AS items(item)
WHERE 1 = CASE
    WHEN item = 'sugar' AND sugar > 0 THEN 1
    WHEN item = 'rice' AND rice > 0 THEN 1
    WHEN item = 'pampers' AND pampers > 0 THEN 1
    WHEN item = 'catsup' AND catsup > 0 THEN 1
    ELSE 0 END
GROUP BY item

GilbertoH · December 14, 2018, 5:04pm

Thank you.

Thats the way I am using right now, with CASE, but it is very slow in my case, so I was requesting help to find a different way.

on the real application, on the store I have over 40k brands or items and just to create the query with such kind of colums is a mess.

also the calculation for the case is slow. about 30 minutes for each month of 400 million rows.

ScottPletcher · December 14, 2018, 8:04pm

I have no idea what your existing code is or how the table is indexed, so I have no idea how to improve performance, other than writing (reasonably) efficient SQL now to create the desired results.

If you want to post DDL for the tables, including all indexes, and your code, we can take a look at that too. All statements with a CASE in them will not necessarily perform exactly the same, although the task you are doing will take some amount of data crunching which will take time.