Creating groups based on criteria and time based?

Hello community!
There is a T-SQL that I cannot figure it out how to do it, hope you can point me in the right direction or what would be the best approach (in terms of performance).
Using cursors is not a good idea since the amout of data will be heavy in production.

An industrial application will be populating a table every second during certain time. 2 variables (combinations of filter and tank) are key in order to "group" the data event. Also, an exception would be if time exceeds certain time (in this example "newgroup" knows when a new datagroup should be considered). Later on, there will be a process in charge to detect the "goup events" that were inserted in the database.

Values of "desired_datagroup_result_example" don't matter (but they need to be unique: NEWID() might be better than unique numeric values), the idea that I try to show is that the t-sql should throw 8 groups for this dataset example.

DECLARE @data TABLE(id int, newgroup bit, idfilter tinyint, idtank tinyint, created datetime, desired_datagroup_result_example int);
INSERT INTO @data VALUES
(3541,0,2,5,'2024-07-01 08:00:01.000',102),(1740,1,1,3,'2024-07-01 08:00:01.000',100),(1739,0,1,3,'2024-07-01 08:00:02.000',100),(3540,0,2,5,'2024-07-01 08:00:02.000',102),(3539,0,2,5,'2024-07-01 08:00:03.000',102),
(1738,0,1,3,'2024-07-01 08:00:03.000',100),(1737,0,1,3,'2024-07-01 08:00:04.000',100),(3538,0,2,5,'2024-07-01 08:00:04.000',102),(3537,0,2,5,'2024-07-01 08:00:05.000',102),(5342,0,2,1,'2024-07-01 09:00:01.000',103),
(5341,0,2,1,'2024-07-01 09:00:02.000',103),(5340,0,2,1,'2024-07-01 09:00:03.000',103),(5339,0,2,1,'2024-07-01 09:00:04.000',103),(5338,0,2,1,'2024-07-01 09:00:05.000',103),(5337,0,2,1,'2024-07-01 09:00:06.000',103),
(5336,0,2,1,'2024-07-01 09:00:07.000',103),(5335,0,2,1,'2024-07-01 09:00:08.000',103),(5334,1,2,1,'2024-07-01 09:16:09.000',104),(5333,0,2,1,'2024-07-01 09:16:10.000',104),(5332,0,2,1,'2024-07-01 09:16:11.000',104),
(5331,0,2,1,'2024-07-01 09:16:12.000',104),(5344,0,2,5,'2024-07-01 10:00:01.000',105),(5343,1,1,3,'2024-07-01 10:00:01.000',101),(5345,0,1,3,'2024-07-01 10:00:02.000',101),(5346,0,2,5,'2024-07-01 10:00:02.000',105),
(5347,0,2,5,'2024-07-01 10:00:03.000',105),(5348,0,1,3,'2024-07-01 10:00:03.000',101),(5349,0,1,3,'2024-07-01 10:00:04.000',101),(5350,0,2,5,'2024-07-01 10:00:04.000',105),(5351,0,2,5,'2024-07-01 10:00:05.000',105),
(5352,0,2,1,'2024-07-01 11:00:01.000',106),(5353,0,2,1,'2024-07-01 11:00:02.000',106),(5354,0,2,1,'2024-07-01 11:00:03.000',106),(5355,0,2,1,'2024-07-01 11:00:04.000',106),(5356,0,2,1,'2024-07-01 11:00:05.000',106),
(5357,0,2,1,'2024-07-01 11:00:06.000',106),(5358,0,2,1,'2024-07-01 11:00:07.000',106),(5359,0,2,1,'2024-07-01 11:00:08.000',106),(5360,1,2,1,'2024-07-01 11:16:09.000',107),(5361,0,2,1,'2024-07-01 11:16:10.000',107),
(5362,0,2,1,'2024-07-01 11:16:11.000',107),(5363,0,2,1,'2024-07-01 11:16:12.000',107);

SELECT * FROM @data
ORDER BY idfilter ASC, created ASC;

Edit: Forgot to mention the MSSQL version: SQL2019 standard edition x64

Any ideas are appreciated.

Cheers,

this is a typical Gaps and Island problem. You may refer to article below.

1 Like

Hi khtan,
Thank you for pointing me in the right direction. Very handy information you have provided.

Using a similar approach, I am facing an issue with the example data: getting merged islands.

Have you experienced something similar?

DECLARE @data TABLE(id int, newgroup bit, idfilter tinyint, idtank tinyint, created datetime, desired_datagroup_result_example int);
INSERT INTO @data VALUES
(3541,0,2,5,'2024-07-01 08:00:01.000',102),(1740,1,1,3,'2024-07-01 08:00:01.000',100),(1739,0,1,3,'2024-07-01 08:00:02.000',100),(3540,0,2,5,'2024-07-01 08:00:02.000',102),(3539,0,2,5,'2024-07-01 08:00:03.000',102),
(1738,0,1,3,'2024-07-01 08:00:03.000',100),(1737,0,1,3,'2024-07-01 08:00:04.000',100),(3538,0,2,5,'2024-07-01 08:00:04.000',102),(3537,0,2,5,'2024-07-01 08:00:05.000',102),(5342,0,2,1,'2024-07-01 09:00:01.000',103),
(5341,0,2,1,'2024-07-01 09:00:02.000',103),(5340,0,2,1,'2024-07-01 09:00:03.000',103),(5339,0,2,1,'2024-07-01 09:00:04.000',103),(5338,0,2,1,'2024-07-01 09:00:05.000',103),(5337,0,2,1,'2024-07-01 09:00:06.000',103),
(5336,0,2,1,'2024-07-01 09:00:07.000',103),(5335,0,2,1,'2024-07-01 09:00:08.000',103),(5334,1,2,1,'2024-07-01 09:16:09.000',104),(5333,0,2,1,'2024-07-01 09:16:10.000',104),(5332,0,2,1,'2024-07-01 09:16:11.000',104),
(5331,0,2,1,'2024-07-01 09:16:12.000',104),(5344,0,2,5,'2024-07-01 10:00:01.000',105),(5343,1,1,3,'2024-07-01 10:00:01.000',101),(5345,0,1,3,'2024-07-01 10:00:02.000',101),(5346,0,2,5,'2024-07-01 10:00:02.000',105),
(5347,0,2,5,'2024-07-01 10:00:03.000',105),(5348,0,1,3,'2024-07-01 10:00:03.000',101),(5349,0,1,3,'2024-07-01 10:00:04.000',101),(5350,0,2,5,'2024-07-01 10:00:04.000',105),(5351,0,2,5,'2024-07-01 10:00:05.000',105),
(5352,0,2,1,'2024-07-01 11:00:01.000',106),(5353,0,2,1,'2024-07-01 11:00:02.000',106),(5354,0,2,1,'2024-07-01 11:00:03.000',106),(5355,0,2,1,'2024-07-01 11:00:04.000',106),(5356,0,2,1,'2024-07-01 11:00:05.000',106),
(5357,0,2,1,'2024-07-01 11:00:06.000',106),(5358,0,2,1,'2024-07-01 11:00:07.000',106),(5359,0,2,1,'2024-07-01 11:00:08.000',106),(5360,1,2,1,'2024-07-01 11:16:09.000',107),(5361,0,2,1,'2024-07-01 11:16:10.000',107),
(5362,0,2,1,'2024-07-01 11:16:11.000',107),(5363,0,2,1,'2024-07-01 11:16:12.000',107);

DECLARE @data2 TABLE(id int, newgroup bit, idfilter tinyint, idtank tinyint, created datetime, desired_datagroup_result_example int, denserank_group varchar(16));
;WITH res AS(
	SELECT	id, newgroup, idfilter, idtank, created, desired_datagroup_result_example,
			--IIF(newgroup=1,dr_newgroup,dr_main-dr_newgroup) AS denserankgroup
			CONVERT(varchar(8),IIF(newgroup=1,criteriafilter_group,criteriafilter_main-criteriafilter_group))+
			CONVERT(varchar(8),IIF(newgroup=1,criteriatank_group,criteriatank_main-criteriatank_group)) AS denserankgroup
	FROM (
		SELECT	*,
--				DENSE_RANK() OVER (PARTITION BY idfilter, idtank ORDER BY idfilter, created ASC) AS dr_main,					-- will generate
--				DENSE_RANK() OVER (PARTITION BY idfilter, idtank, newgroup ORDER BY idfilter, created ASC) AS dr_newgroup,	-- merged islands groups

				DENSE_RANK() OVER (PARTITION BY idfilter ORDER BY idfilter, created ASC) AS criteriafilter_main,				-- will also
				DENSE_RANK() OVER (PARTITION BY idfilter, newgroup ORDER BY idfilter, created ASC) AS criteriafilter_group,		-- generate merged 
				DENSE_RANK() OVER (PARTITION BY idtank ORDER BY idfilter, created ASC) AS criteriatank_main,					-- islands when
				DENSE_RANK() OVER (PARTITION BY idtank, newgroup ORDER BY idfilter, created ASC) AS criteriatank_group			-- grouping
		FROM @data
	) AS tmp
)INSERT INTO @data2 SELECT * FROM res;

--SELECT * FROM @data2 ORDER BY idfilter ASC, created ASC;

SELECT MIN(idfilter) AS idfilter, MIN(idtank) AS idtank, MIN(created) AS [start], MAX(created) AS [end]
FROM @data2 GROUP BY desired_datagroup_result_example ORDER BY desired_datagroup_result_example -- for comparison purpose

SELECT idfilter, idtank, MIN(created) AS [start], MAX(created) AS [end]
FROM @data2 GROUP BY idfilter, idtank, denserank_group
ORDER BY idfilter, [start];

I find it easier to use lead() / lag() together with window function to find the gap and island

First CTE uses lag() to get the previous value of created

Second CTE uses case expression to determine if it is a new group (current created - previous created is more than 15 minutes) and return value 1 or 0. Next use perform a cumulative sum (sum() over (order by ...), that will give you the grouping.

And to get the desire result 100, 101, use dense_rank().

You can examine the interim value of the query in the dbfiddle link below.

with cte as
(
  select id, newgroup, idfilter, idtank, created, 
         desired = desired_datagroup_result_example,
         prev_created = lag(created) over (partition by idfilter order by created)
  from   @data
),
cte2 as
(
  select *,
         grp = sum(case when datediff(minute, prev_created, created) > 15
                        then 1
                        else 0
                        end) over (partition by idfilter order by created)
  from   cte
)
select *, datagroup = 99 + dense_rank () over (order by idfilter, grp)
from   cte2
order by idfilter asc, created asc;

db<>fiddle demo

2 Likes