SQLTeam.com | Weblogs | Forums

Get records within a range of another set of records?


#1

This is something I was going to do in code, but wondering if it's possible in SQL? I need to go through a set of records which match a certain criteria, then find a super-set of those records which match a criteria comparing to each of the first set. I can illustrate using an example of books as below:

DECLARE @Books TABLE (
  BookId INT,
  Title VARCHAR(100),
  PublishedUTC DATE,
  TotalPages INT
);

INSERT INTO @Books (BookId, PublishedUTC, Title, TotalPages)
  VALUES (1, '2001-01-01', 'Stewards of Gondor', 245);
INSERT INTO @Books (BookId, PublishedUTC, Title, TotalPages)
  VALUES (2, '2010-02-02', 'Heralds of Highgarden', 71);
INSERT INTO @Books (BookId, PublishedUTC, Title, TotalPages)
  VALUES (3, '2010-03-03', 'Tales of Trombones', 1244);
INSERT INTO @Books (BookId, PublishedUTC, Title, TotalPages)
  VALUES (4, '2004-04-04', 'Fables of Foibles', 50);

In this example the first criteria is TotalPages > 100. The second criteria is -1 or +1 year of published date.

The first criteria will find book IDs 1 and 3. Now I need to find every book which has a published date either 1 year before, or 1 year after, the published date of each item in the first set. So that will (of course) find Book IDs 1 and 3 themselves, plus Book ID 2 (which was published within 1 year of Book ID 3).

Hope that explains well enough. If possible, can this be done in SQL with just SELECTs, no procedural code? I thought of starting using a WITH, like so:

WITH ManyPages AS (
  SELECT b.PublishedUTC FROM @Books b where b.TotalPages > 100
)
SELECT b.BookId
FROM @Books b
  WHERE b.PublishedUTC -- ... within 1yr of each PublishedUTC in ManyPages

Would that form work in some way?


#2

ooh I think I cracked it... this seems to work; is this indeed the correct way? Even if so, is there a better way, or is this method reasonably efficient?

WITH ManyPages AS (
  SELECT b.BookId, b.PublishedUTC FROM @Books b where b.TotalPages > 100
)
SELECT DISTINCT b.BookId
FROM @Books b
INNER JOIN ManyPages mp 
  ON ABS(DATEDIFF(YEAR, b.PublishedUTC, mp.PublishedUTC)) <= 1

#3

heh I'm enjoying this but it does my head in! I think this is another way, but not sure if this is more or less efficient than the previous one:

SELECT b.BookId
FROM @Books b INNER JOIN (
  SELECT b.PublishedUTC
  FROM @Books b
  WHERE b.TotalPages > 100) b2 ON ABS(DATEDIFF(YEAR, b.PublishedUTC, b2.PublishedUTC)) <= 1

Again, it seems to work, but I'd love someone to verify in case I'm missing something!


#4

I would do it this way:

select b.*
  from @Books as a
       inner join @Books as b
               on b.PublishedUTC>=dateadd(yy,-1,a.PublishedUTC)
              and b.PublishedUTC<=dateadd(yy,1,a.PublishedUTC)
 where a.TotalPages>100

#5

Thanks bitsmed, that works too, though I had to stare at it for a few minutes to get it. :smile:

Each of these 3 methods seem to run about the same speed... can you tell me why you would do it that way over the other 2?


#6

You'll need a bigger set of data to see the speed difference, and also if you put an index on PublishedUCT, speed will be even more increased.
My query doesn't perform functions/calculations on joined table fields, which will give you speed (if you have lots of data and index on PublishedUTC).
Also I avoid subselects which is to be avoid if possible.


#7

Thanks again bitsmed, very helpful!