Query with several LEFT JOINs / avoid cartesian product

Hi,

I would like a single query to return data from several tables, where there may be 0 or more results. LEFT JOIN, perfect! Except that if I use 2 LEFT JOINs in a single query, and there are several results in each of them, MSSQL returns the cartesian product of the subqueries.

I'll explain with an example: I have a TITLE table, which contains information about some titles. I have a HOOK table, which marks some positions of the titles with a special name. I have a STATS table, which records when titles have been played. I would like to retrieve the information from a title with its HOOK and STATS entries.

--
create database newtest
GO
use newtest
GO

create table T_TITLE(NUM_TITLE INTEGER PRIMARY KEY, NAME VARCHAR(255));
create table T_STATS(NUM_STATS INTEGER PRIMARY KEY, NUM_TITLE INTEGER, DATE_BROADCAST DATETIME);
create table T_HOOK(NUM_HOOK INTEGER PRIMARY KEY, NUM_TITLE INTEGER, NAME VARCHAR(255));

insert into T_TITLE(NUM_TITLE,NAME) values (1,'name');
insert into T_STATS(NUM_STATS,NUM_TITLE,DATE_BROADCAST) values (10,1,'2015-07-20 00:00:00'),(11,1,'2015-07-20 01:00:00'),(12,1,'2015-07-20 02:00:00'),(13,1,'2015-07-20 03:00:00');
insert into T_HOOK(NUM_HOOK,NUM_TITLE,NAME) values (20,1,'hook1'),(21,1,'hook2');

--
select *
from T_TITLE
left join T_STATS on T_TITLE.NUM_TITLE=T_STATS.NUM_TITLE
left join T_HOOK on T_TITLE.NUM_TITLE=T_HOOK.NUM_TITLE

This works but if I have 2 entries in HOOK and 4 entries in STATS, I get 8 lines total, with the cartesian product of HOOKxSTATS. I would like only 4 lines, with each information only once. Is this possible?

--
What I get:
1;name; 10;1;2015-07-20 00:00:00.000; 20;1;hook1
1;name; 10;1;2015-07-20 00:00:00.000; 21;1;hook2
1;name; 11;1;2015-07-20 01:00:00.000; 20;1;hook1
1;name; 11;1;2015-07-20 01:00:00.000; 21;1;hook2
1;name; 12;1;2015-07-20 02:00:00.000; 20;1;hook1
1;name; 12;1;2015-07-20 02:00:00.000; 21;1;hook2
1;name; 13;1;2015-07-20 03:00:00.000; 20;1;hook1
1;name; 13;1;2015-07-20 03:00:00.000; 21;1;hook2

What I would like:
1;name; 10;1;2015-07-20 00:00:00.000; 20;1;hook1
1;name; 11;1;2015-07-20 01:00:00.000; 21;1;hook2
1;name; 12;1;2015-07-20 02:00:00.000; NULL?;NULL?;NULL?
1;name; 13;1;2015-07-20 03:00:00.000; NULL?;NULL?;NULL?

What your logic for showing hook1 on same line as num_stats 10?
Is it that num_hooks always is 10 higher than num_stats?
If so, you can use this:

select *
  from T_TITLE as a
       left outer join T_STATS as b
                    on b.NUM_TITLE=a.NUM_TITLE
       left outer join T_HOOK as c
                    on c.NUM_TITLE=a.NUM_TITLE
                   and c.NUM_HOOK=b.NUM_STATS+10
1 Like

No. There is no link between the STATS data and the HOOK data. It may be created at any time with any primary key value. This was just an example.

How do you determine which hook goes where?
In your example

1;name; 10;1;2015-07-20 00:00:00.000; 20;1;hook1
1;name; 11;1;2015-07-20 01:00:00.000; 21;1;hook2
1;name; 12;1;2015-07-20 02:00:00.000; NULL?;NULL?;NULL?
1;name; 13;1;2015-07-20 03:00:00.000; NULL?;NULL?;NULL?

might aswell be

1;name; 10;1;2015-07-20 00:00:00.000; 21;1;hook2
1;name; 11;1;2015-07-20 01:00:00.000; NULL?;NULL?;NULL?
1;name; 12;1;2015-07-20 02:00:00.000; 20;1;hook1
1;name; 13;1;2015-07-20 03:00:00.000; NULL?;NULL?;NULL?

???

Is the place of hook random?

1 Like

I don't care, I just would like all the data once. Order is not important.

Try this:

select a.NUM_TITLE
      ,a.NAME
      ,a.NUM_STATS
      ,a.DATE_BROADCAST
      ,b.NUM_HOOK
      ,b.NAME
  from (select a.NUM_TITLE
              ,a.NAME
              ,b.NUM_STATS
              ,b.DATE_BROADCAST
              ,row_number() over(partition by a.num_title order by b.num_stats) as rn
          from T_TITLE as a
               left outer join T_STATS as b
                            on b.NUM_TITLE=a.NUM_TITLE
       ) as a
       left outer join (select *
                              ,row_number() over(partition by num_title order by num_hook) as rn
                          from T_HOOK
                       ) as b
                    on b.NUM_TITLE=a.NUM_TITLE
                   and b.rn=a.rn
;
1 Like

Thanks a bunch, that opened a few possibilities. It has a great flaw though, if there are more hooks than stats, it only returns that many rows as there are stats; I tried to look into this today, but I still don't understand your query that well. I will continue tomorrow, maybe I'll have a few questions. :smile:
Thanks again.