Grouping time periods with gaps in them

Hello, this is a simplified sample to demonstrate the problem I am having:

CREATE TABLE #Sample (ID INT NOT NULL, Yr INT NOT NULL, Col_A VARCHAR(10), Col_B INT, Col_C INT)
INSERT INTO #Sample values
(1, 2004, 'N/S', 10, 255),
(1, 2005, 'N/S', 10, 255),
(1, 2006, 'N/S', NULL, 1),
(1, 2007, 'N/S', 10, 255),
(1, 2008, 'N/S', 10, 255),
(1, 2009, 'N/S', 10, 255),
(1, 2010, 'N/S', 0, 255),
(1, 2011, 'N/A', 10, 255),
(1, 2012, 'N/A', 10, 255),
(1, 2013, 'N/A', 0, 0),
(1, 2014, 'Y', 101, 256),
(1, 2015, 'N/S', 1, 32),
(1, 2016, 'N/A', NULL,NULL );

SELECT CASE WHEN MIN(YR )=MAX(YR) THEN CAST(MIN(YR) AS CHAR(4)) ELSE CAST(MIN(YR ) AS CHAR(4))+'-'+RIGHT(CAST(MAX(YR) AS CHAR(4)),2) END AS FROMYEAR_TOYEAR,
ID , Col_A , Col_B , Col_C
FROM #Sample GROUP BY
ID , Col_A , Col_B , Col_C ORDER BY 1

RESULTS:
FROMYEAR_TOYEAR ID Col_A Col_B Col_C
2004-09 1 N/S 10 255
2006 1 N/S NULL 1
2010 1 N/S 0 255
2011-12 1 N/A 10 255
2013 1 N/A 0 0
2014 1 Y 101 256
2015 1 N/S 1 32
2016 1 N/A NULL NULL

CORRECT RESULTS SHOULD BE:

FROMYEAR_TOYEAR ID Col_A Col_B Col_C
2004-05 1 N/S 10 255
2006 1 N/S NULL 1
2007-09 1 N/S 10 255
2010 1 N/S 0 255
2011-12 1 N/A 10 255
2013 1 N/A 0 0
2014 1 Y 101 256
2015 1 N/S 1 32
2016 1 N/A NULL NULL

Please notice that periods from 2004 to 2009 though having same data have to be broken down, as the year in between(2006) had different data. I was trying to process the records by using windowing functions to break the group with the gap in years but as of today I don't have a viable solution...It seems to me there should be a way to do it in a single select statement...
Thank you for your help in advance

This is the class of problems that is usually referred to as the "Islands and Gaps" problem. You will find a number of SQL solutions if you search for it.

One way to solve it (and perhaps the most efficient) is to use the row_number function as shown below:

SELECT MIN(Yr),MAX(Yr),MAX(col_A), MAX(Col_B), MAX(Col_C)
FROM
(
	SELECT *,
		ROW_NUMBER() OVER(PARTITION BY id ORDER BY Yr)
		-ROW_NUMBER() OVER(PARTITION BY id ORDER BY col_b,Yr) Grp
	FROM
		#Sample
)s
GROUP BY Grp
ORDER BY 1;

My partitioning and ordering columns may not be exactly what you need for your more general problem, but for the sample data order by col_B, Yr should work. Also, I have not combined the min and max years as you have done, but that is easy enough to do.

1 Like

Hello, James, thank you very much for your reply. I tried windowing functions and looking for gaps in years but couldn't think up a good way to put them together. This will definitely work. Appreciate your help and reference to "Islands and Gaps" problem

Tables have keys that this thing does not. Identifiers are not numeric because you do no math on them. I prefer the MySQL convention for years so they can sort with ISO-8601 dates and you can be ready if ISO adopts it (under discussion).

Why is your generic, magic “id” a constant? Drop it as redundant. . Here is my guess at corrections, with ISO-11179 rules, a key, etc. :

CREATE TABLE Samples
(sample_year CHAR(10) NOT NULL PRIMARY KEY,
col_a VARCHAR(10) NOT NULL,
col_b INTEGER,
col_c INTEGER);

INSERT INTO Sample
VALUES
('2004-00-00', 'N/S', 10, 255),
('2005-00-00', 'N/S', 10, 255),
('2006-00-00', 'N/S', NULL, 1),
('2007-00-00', 'N/S', 10, 255),
('2008-00-00', 'N/S', 10, 255),
('2009-00-00', 'N/S', 10, 255),
('2010-00-00', 'N/S', 0, 255),
('2011-00-00', 'N/A', 10, 255),
('2012-00-00', 'N/A', 10, 255),
('2013-00-00', 'N/A', 0, 0),
('2014-00-00', 'Y', 101, 256),
('2015-00-00', 'N/S', 1, 32),
('2016-00-00', 'N/A', NULL, NULL);

We do not format data in a query, so all that string manipulation is not done in the database. We pass data to a presentation layer to make it pretty. Your mindset is still locked in 1970's monolithic COBOL programming. Your query should be:

SELECT MIN(sample_year) AS start_sample_year,
MAX(sample_year) AS end_sample_year,
sample_id, col_a, col_b, col_c
FROM Samples
GROUP BY col_a, col_b, col_c;

Please notice that periods from 2004 to 2009 though having same data have to be broken down, as the year in between(2006) had different data. I was trying to process the records [sic: rows are not records; more COBOL!] by using windowing functions to break the group with the gap in years but as of today I don't have a viable solution...It seems to me there should be a way to do it in a single select statement… <<
Use the “island & gaps” that JamesK gave you. The only thing I might change is having a Calendar table.

WITH X
AS
(SELECT sample_year, col_a, col_b, col_c,
(ROW_NUMBER() OVER(ORDER BY sample_year)
-ROW_NUMBER() OVER(ORDER BY col_a, col_b, col_c)) AS sample_grp
FROM Samples
GROUP BY sample_year, col_a, col_b, col_c)

SELECT MIN(sample_year), MAX(sample_year), col_a, col_b, col_c --, sample_grp
FROM X
GROUP BY col_a, col_b, col_c, sample_grp
ORDER BY MIN(sample_year); –-- optional sort

Hello,Mr. Celko. Have been reading your articles since forever, and consider them a really good read, but this is about your reply. First of all, this is how I started my question:

"Hello, this is a simplified sample to demonstrate the problem I am having:"

Has nothing to do with the table structure or data i'm dealing with in real life. This was just a simple way to put some data together and show the desired output. There is no presentation layer, there is nothing, just a concept of dealing with same data that cannot be aggregated together because there are gaps. In real life I WAS using a calendar table and the row_number function, just tried to solve with LEAD/LAG, and got a little lost in the time allotted to this. That's why I resorted to asking the "Team" and got a very fast and spot-on reply from JamesK. This forum is called T-SQL , NOT database design 101 (though I always thought rows and records were interchangeable terms), I apologize for insulting your feelings but believe me if I ever thought you would be reading this with anything more than "Islands and Gaps" problem in mind - thank you JamesK again - I would have spent more time on thinking how to present it. Regards