Cluster behavior of Timeseries

d_schwarz · April 18, 2016, 6:00pm

Dear Community

I'm trying to identify parts of a timeserie where we have an up movment
(somthing like down, stable, up, stable, stable, up,... until first down) or a down-movment
( somthing like up,up, stable, up, down, down, stable,stable,up,...)

Of corse I could solve this problem with a loop but this is very slow of course...
Can somebody think of a better solution ?
Here some sample data to better illustrate my question: I'm trying to calculate the Variable Group...

Thanks a lot and cheers,
David

JamesK · April 18, 2016, 7:33pm

First, a screen-shot is of very little use to someone who wants to write a query to solve your problem. Instead, post DDL - i.e., code that someone can copy and paste to their SSMS window and run to create a table and populate it with your test data. For example, like this, where I have inserted only the first two rows:

CREATE TABLE #tmp
(
	ID INT,
	Date DATE,
	Direction VARCHAR(32),
	[Group] INT
);

INSERT INTO #tmp VALUES
	(1,'20051011','up',1),
	(1,'20051012','stable',1)
	--- and so on.

Second, it is not clear to me what your desired output is. Is the Direction column that you are trying to compute? If it is, what is the logic? If it is something else that you are trying to compute, what is it? Again, if you create a test table with the desired output, that would be the most useful thing.

d_schwarz · April 18, 2016, 9:40pm

Hi Jamess
Thanks for the comment.
I have a Measure from different Patients (ID) and Dates.
From the measures I constructed the Direction (if the Measure increased since the last Date -> up, ...)
Now I want to distinguish the individual up and down movement:
up movments: down, stable, up, stable, stable, up,, down
down movment: up,up, stable, up, down, down, stable,stable,up

Below you see my input. The result is the Variable Group like in the first poste

CREATE TABLE #tmp
(
ID INT,
Date DATE,
Measure INT,
Direction VARCHAR(32),
);

INSERT INTO #tmp VALUES
(1,'20051011',3,'up'),
(1,'20051012',3,'stable'),
(1,'20051013,3,'stable'),
(1,'20051014,4,'up'),
(1,'20051015,3,'down'),
(1,'20051016,2,'down'),
(1,'20051017,'2,'stable'),
(1,'20051018,'2,'stable'),
(1,'20051019,'3,'up'),
(1,'20051020,3,'stable'),
(1,'20051021,2,'down'),
(1,'20051022,2,'stable'),
(1,'20051023,2,'stable'),
(1,'20051024,2,'stable'),
(1,'20051025,3,'up'),
(1,'20051026,3,'stable'),
(1,'20051027,3,'stable'),
(1,'20051028,3,'stable'),
(1,'20051029,3,'stable'),
(1,'20051030,3,'stable'),
(1,'20051101,2,'down'),
(1,'20051102,2,'stable'),
(1,'20051103,2,'stable')

JamesK · April 19, 2016, 3:20pm

Your sample data is not usable - it has lot of syntax errors. You will see what I mean if you copy the code you posted to an SSMS window and try to run it.

The problem you are trying to solve is the class of problems known as "Islands and Gaps". In particular, Itzik Ben-Gan has written many articles on this subject. One of the more efficient ways of solving this is using the row_number function, something like this:

SELECT
	*,
	ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Date)-
	ROW_NUMBER() OVER (PARTITION BY Id ORDER BY CASE WHEN GrpDirection = 'up' THEN 1 ELSE 0 END ) AS Grp
FROM
	#tmp a
	OUTER APPLY
	(
		SELECT TOP (1) b.Direction AS GrpDirection
		FROM #tmp b
		WHERE b.Date <= a.Date
		AND b.Direction <> 'stable'
		AND b.Id = a.Id
		ORDER BY b.Date DESC
	)	b

That will not take you all the way, the group number is just some random number which is not sequentially ordered. You can use that information to generate an ordered group number.