SQLTeam.com | Weblogs | Forums

Query with several LEFT JOINs / avoid cartesian product


#1

Hi,

I would like a single query to return data from several tables, where there may be 0 or more results. LEFT JOIN, perfect! Except that if I use 2 LEFT JOINs in a single query, and there are several results in each of them, MSSQL returns the cartesian product of the subqueries.

I'll explain with an example: I have a TITLE table, which contains information about some titles. I have a HOOK table, which marks some positions of the titles with a special name. I have a STATS table, which records when titles have been played. I would like to retrieve the information from a title with its HOOK and STATS entries.

--
create database newtest
GO
use newtest
GO

create table T_TITLE(NUM_TITLE INTEGER PRIMARY KEY, NAME VARCHAR(255));
create table T_STATS(NUM_STATS INTEGER PRIMARY KEY, NUM_TITLE INTEGER, DATE_BROADCAST DATETIME);
create table T_HOOK(NUM_HOOK INTEGER PRIMARY KEY, NUM_TITLE INTEGER, NAME VARCHAR(255));

insert into T_TITLE(NUM_TITLE,NAME) values (1,'name');
insert into T_STATS(NUM_STATS,NUM_TITLE,DATE_BROADCAST) values (10,1,'2015-07-20 00:00:00'),(11,1,'2015-07-20 01:00:00'),(12,1,'2015-07-20 02:00:00'),(13,1,'2015-07-20 03:00:00');
insert into T_HOOK(NUM_HOOK,NUM_TITLE,NAME) values (20,1,'hook1'),(21,1,'hook2');

--
select *
from T_TITLE
left join T_STATS on T_TITLE.NUM_TITLE=T_STATS.NUM_TITLE
left join T_HOOK on T_TITLE.NUM_TITLE=T_HOOK.NUM_TITLE

This works but if I have 2 entries in HOOK and 4 entries in STATS, I get 8 lines total, with the cartesian product of HOOKxSTATS. I would like only 4 lines, with each information only once. Is this possible?

--
What I get:
1;name; 10;1;2015-07-20 00:00:00.000; 20;1;hook1
1;name; 10;1;2015-07-20 00:00:00.000; 21;1;hook2
1;name; 11;1;2015-07-20 01:00:00.000; 20;1;hook1
1;name; 11;1;2015-07-20 01:00:00.000; 21;1;hook2
1;name; 12;1;2015-07-20 02:00:00.000; 20;1;hook1
1;name; 12;1;2015-07-20 02:00:00.000; 21;1;hook2
1;name; 13;1;2015-07-20 03:00:00.000; 20;1;hook1
1;name; 13;1;2015-07-20 03:00:00.000; 21;1;hook2

What I would like:
1;name; 10;1;2015-07-20 00:00:00.000; 20;1;hook1
1;name; 11;1;2015-07-20 01:00:00.000; 21;1;hook2
1;name; 12;1;2015-07-20 02:00:00.000; NULL?;NULL?;NULL?
1;name; 13;1;2015-07-20 03:00:00.000; NULL?;NULL?;NULL?


#2

What your logic for showing hook1 on same line as num_stats 10?
Is it that num_hooks always is 10 higher than num_stats?
If so, you can use this:

select *
  from T_TITLE as a
       left outer join T_STATS as b
                    on b.NUM_TITLE=a.NUM_TITLE
       left outer join T_HOOK as c
                    on c.NUM_TITLE=a.NUM_TITLE
                   and c.NUM_HOOK=b.NUM_STATS+10

#3

No. There is no link between the STATS data and the HOOK data. It may be created at any time with any primary key value. This was just an example.


#4

How do you determine which hook goes where?
In your example

1;name; 10;1;2015-07-20 00:00:00.000; 20;1;hook1
1;name; 11;1;2015-07-20 01:00:00.000; 21;1;hook2
1;name; 12;1;2015-07-20 02:00:00.000; NULL?;NULL?;NULL?
1;name; 13;1;2015-07-20 03:00:00.000; NULL?;NULL?;NULL?

might aswell be

1;name; 10;1;2015-07-20 00:00:00.000; 21;1;hook2
1;name; 11;1;2015-07-20 01:00:00.000; NULL?;NULL?;NULL?
1;name; 12;1;2015-07-20 02:00:00.000; 20;1;hook1
1;name; 13;1;2015-07-20 03:00:00.000; NULL?;NULL?;NULL?

???

Is the place of hook random?


#5

I don't care, I just would like all the data once. Order is not important.


#6

Try this:

select a.NUM_TITLE
      ,a.NAME
      ,a.NUM_STATS
      ,a.DATE_BROADCAST
      ,b.NUM_HOOK
      ,b.NAME
  from (select a.NUM_TITLE
              ,a.NAME
              ,b.NUM_STATS
              ,b.DATE_BROADCAST
              ,row_number() over(partition by a.num_title order by b.num_stats) as rn
          from T_TITLE as a
               left outer join T_STATS as b
                            on b.NUM_TITLE=a.NUM_TITLE
       ) as a
       left outer join (select *
                              ,row_number() over(partition by num_title order by num_hook) as rn
                          from T_HOOK
                       ) as b
                    on b.NUM_TITLE=a.NUM_TITLE
                   and b.rn=a.rn
;

#7

Thanks a bunch, that opened a few possibilities. It has a great flaw though, if there are more hooks than stats, it only returns that many rows as there are stats; I tried to look into this today, but I still don't understand your query that well. I will continue tomorrow, maybe I'll have a few questions. :smile:
Thanks again.