Compare latest 30 rows with previous 30 rows - Urgent Help

anusha600 · April 30, 2020, 7:55pm

HI,

I am working on a query to compare the mean shift for values between the latest 30 records and the next set of 30 records.
I am using row_number() over to get all the 60 records for my comparison.
But I do not want to run this every time a new record is inserted into the database/table.
I only want to run this query when 30 new records have entered the database so I can compare all new latest 30 records with the old 30 records.

I am not sure how to achieve this. Please help.

Thanks in advance!

yosiasz · April 30, 2020, 10:39pm

Please provide sample data

harishgg1 · May 1, 2020, 12:22pm

hi
hope this helps

select
case when row number mod 30 = 0 then compare latest 30 to previous 30
else dont do anything !!

harishgg1 · May 1, 2020, 2:58pm

are all the 30 records going to be inserted at once !! ??

anusha600 · May 1, 2020, 9:30pm

thanks for the response.
No all 30 records will not be entered at once.
once there are a new set of 30 records i want to do my comparison.

Thanks

yosiasz · May 2, 2020, 1:56am

Provide sample data. Help us help you

harishgg1 · May 2, 2020, 8:46am

hi

then you will have to mark the records ..
as they come in

once you reach 30 .. do the calculation .. then start marking the next 30 records
reset the old records ..

how will the records be inserted ??
one at a time
OR
let say you have 25 records .. then 10 records are inserted
which takes it to 35 .. ..how would you handle this ??

anusha600 · May 2, 2020, 9:47pm

hi yosiasz

Samples	Sample datetime	Type	Value
S1	05/2/2020 12:50AM	F1	21
S1	05/2/2020 12:58AM	F2	20
S1	05/2/2020 1:15AM	F3	19
S2	05/2/2020 1:25AM	F1	18
S2	05/2/2020 1:35AM	F2	19
S2	05/2/2020 1:50AM	F3	20
S3	05/2/2020 2:00AM	F1	25
S3	05/2/2020 2:20AM	F2	16
S3	05/2/2020 2:35AM	F3	16

This is how my data looks.
Daily lets say we get 1000 distinct samples.
I want to calculate the delta mean for each Type (F1,F2,F3) between the latest 30 and next 30 .
Now lets say we have 60 samples and I calculated my delta mean and a new sample has entered the table. I don't want my query to calculate delta mean with this one new sample...my query should be executed only when 30 new samples have entered the database.
Hoping this explanation is clear.

Thank you.

anusha600 · May 2, 2020, 9:48pm

harish,
the records are inserted one at a time, usually with time gap of 10 to 20 minutes

harishgg1 · May 3, 2020, 9:45am

Hi

How about a trigger on the table .. for insert ...

Add a new column to table for calculation .!!!!
Add a new column ( ROW COUNT ) to table which keeps track of rowcount

when a new row inserted
check if it is 30th
if it is 30th row inserted ...

then do the calculation ( latest 30 to previous 30 )

update the column ( new column created for calculation )
reset the rowcount column to 0 for all records !!!

again start with new row inserted

harishgg1 · May 3, 2020, 10:49am

hi

i tried to come up with something ..
old record are marked as old .. new records are marked as new
this is just s starting point ... to the solution !!

i am trying to do for 6 records at a time ( F1, F2 , F3 included )

please click arrow to the left for drop create sample data

drop create sample data ..

drop table #Samples 
go 

create table #Samples 
(
Samples	varchar(100),
Sample datetime	,
Type  varchar(2) ,
Value int ,
What varchar(10)
)
go 

insert into #Samples select 'S1','05/2/2020 12:50','F1',21,'Old'
insert into #Samples select 'S1','05/2/2020 12:58','F2',20,'Old'
insert into #Samples select 'S1','05/2/2020 1:15' ,'F3',19,'Old'

insert into #Samples select 'S2','05/2/2020 1:25' ,'F1',18,'Old'
insert into #Samples select 'S2','05/2/2020 1:35' ,'F2',19,'Old'
insert into #Samples select 'S2','05/2/2020 1:50' ,'F3',20,'Old'

insert into #Samples select 'S3','05/2/2020 2:00' ,'F1',25,'New'
insert into #Samples select 'S3','05/2/2020 2:20' ,'F2',16,'New'
insert into #Samples select 'S3','05/2/2020 2:35' ,'F3',16,'New'

insert into #Samples select 'S4','05/2/2020 2:40','F1',24,'New'
insert into #Samples select 'S4','05/2/2020 2:50','F2',21,'New'
insert into #Samples select 'S4','05/2/2020 2:55','F3',13,'New

; with cte as 
(
	select type,sum(value) as sumval,what 
	from #Samples
	group by type,what
)
select a.Type,a.sumval,b.sumval,b.sumval-a.sumval as Delta 
	from 
	(select * from cte where What ='New' ) a
		join 
	(select * from cte where What ='Old' ) b 
	   on a.Type =b.Type

harishgg1 · May 3, 2020, 11:27am

hi

New records being inserted will have status = 'In Progress'

Once 30 records are inserted ..

Update the table           ... rows with status = 'Old'  to status = 'Done'
                           ... rows with status = 'New' to status = 'Old'
                           ... rows with status = 'In Progress' to status = 'New'

Now run the SQL Query I gave above ..

; with cte as 
(
	select type,sum(value) as sumval,what 
	from #Samples
	group by type,what
)
select a.Type,a.sumval,b.sumval,b.sumval-a.sumval as Delta 
	from 
	(select * from cte where What ='New' ) a
		join 
	(select * from cte where What ='Old' ) b 
	   on a.Type =b.Type

anusha600 · May 4, 2020, 3:37pm

Hi Harish,

Thanks a ton for your effort. I will try it out and let you know how it goes.

Thanks again!

ScottPletcher · May 4, 2020, 4:13pm

sample_data

drop table #samples
go

create table #samples
(
sample varchar(100) NOT NULL,
sample_datetime datetime NOT NULL,
type char(2) NOT NULL,
value decimal(9, 2) NOT NULL,
delta_mean_status tinyint NOT NULL DEFAULT 0,
delta_mean decimal(9, 2) NULL
)
go

insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S1','05/2/2020 12:50','F1',21,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S1','05/2/2020 12:58','F2',20,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S1','05/2/2020 1:15' ,'F3',19,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S2','05/2/2020 1:25' ,'F1',18,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S2','05/2/2020 1:35' ,'F2',19,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S2','05/2/2020 1:50' ,'F3',20,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S3','05/2/2020 2:00' ,'F1',25,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S3','05/2/2020 2:20' ,'F2',16,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S3','05/2/2020 2:35' ,'F3',16,1

insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S4','05/2/2020 2:40','F1',24,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S4','05/2/2020 2:50','F2',21,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S4','05/2/2020 2:55','F3',13,1

/* create this index to vastly improve efficiency of the 30-row lookups */
CREATE UNIQUE NONCLUSTERED INDEX samples__IX_delta_mean_status ON #samples ( delta_mean_status, type, sample_datetime ) WHERE delta_mean_status = 1 WITH ( DATA_COMPRESSION = ROW, FILLFACTOR = 99, SORT_IN_TEMPDB = ON ) ON [PRIMARY];

SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO
CREATE TRIGGER dbo.samples__TR_INSERT
ON dbo.samples
AFTER INSERT
AS
SET NOCOUNT ON;
WHILE (SELECT COUNT(*) FROM dbo.samples WHERE type = 'F1' AND delta_mean_status = 1) >= 30
BEGIN
    UPDATE s
    SET delta_mean_status = 0,
        delta_mean = SUM(total_value) / value_count
    FROM (
        SELECT SUM(value) AS total_value, COUNT(value) AS value_count
        FROM (
            SELECT TOP (30) *
            FROM dbo.samples s
            WHERE delta_mean_status = 1
            ORDER BY sample_datetime
        ) AS s_first_30
    ) AS s_F1
    INNER JOIN dbo.samples s ON s.type = s_F1.type AND s.sample_datetime = s_F1.sample_datetime
    IF @@ROWCOUNT = 0
        BREAK;
END /*IF*/
/* !! Repeat the code above for F2 and F3 !! */
/*end of trigger*/
GO

anusha600 · May 4, 2020, 7:21pm

Hi Scott,
I have around 1000 features, in the example above I have only provided 3 (F1,F2,F3).

yosiasz · May 4, 2020, 7:31pm

do these have a uniqueID or would the combination of the other columns constitute the unique row?

anusha600 · May 4, 2020, 9:41pm

sample name is unique(S1,S2,S3)

anusha600 · May 4, 2020, 10:42pm

Maybe my explanation was not clear before. I am so sorry for that.
Hopefully this will be clear.

Lets say i want to compare latest 3 samples with previous 3 samples. Sample_date is the time the sample was created , sample is the unique identifier
so here i got the latest 6 samples by partitioning on sample and ordering by date is descending order.
with this i can calculate the mean difference between the first 3 and last 3 samples for each feature

Sample	Sample Date	Feature	Value	Row_num
S1	05/4/2020 12:50PM	F1	10	1
S1	05/4/2020 12:50PM	F2	8	1
S1	05/4/2020 12:50PM	F3	5	1
S2	05/4/2020 12:42 PM	F1	9	2
S2	05/4/2020 12:42 PM	F2	5	2
S2	05/4/2020 12:42 PM	F3	3	2
S3	05/4/2020 12:38 PM	F1	11	3
S3	05/4/2020 12:38 PM	F2	7	3
S3	05/4/2020 12:38 PM	F3	2	3
S4	05/4/2020 12:32 PM	F1	10	4
S4	05/4/2020 12:32 PM	F2	7.5	4
S4	05/4/2020 12:32 PM	F3	3	4
S5	05/4/2020 12:12 PM	F1	9	5
S5	05/4/2020 12:12 PM	F2	7	5
S5	05/4/2020 12:12 PM	F3	4	5
S6	05/4/2020 12:02 PM	F1	10	6
S6	05/4/2020 12:02 PM	F2	8	6
S6	05/4/2020 12:02 PM	F3	7	6

A new sample S8 has entered the database.

Sample	Sample Date	Feature	Value	Row_num
S8	05/4/2020 1:20PM	F1	10	1
S8	05/4/2020 1:20PM	F2	8.4	1
S8	05/4/2020 1:20PM	F3	8	1
S1	05/4/2020 12:50PM	F1	10	2
S1	05/4/2020 12:50PM	F2	8	2
S1	05/4/2020 12:50PM	F3	5	2
S2	05/4/2020 12:42 PM	F1	9	3
S2	05/4/2020 12:42 PM	F2	5	3
S2	05/4/2020 12:42 PM	F3	3	3
S3	05/4/2020 12:38 PM	F1	11	4
S3	05/4/2020 12:38 PM	F2	7	4
S3	05/4/2020 12:38 PM	F3	2	4
S4	05/4/2020 12:32 PM	F1	10	5
S4	05/4/2020 12:32 PM	F2	7.5	5
S4	05/4/2020 12:32 PM	F3	3	5
S5	05/4/2020 12:12 PM	F1	9	6
S5	05/4/2020 12:12 PM	F2	7	6
S5	05/4/2020 12:12 PM	F3	4	6

So now S8 , s1, s2 are my latest 3 samples and S3,S4,S5 are my old samples and S6 wont be a part of my query as I am filtering only 6 samples.
I dont want my query/SP to execute now as there is only one new sample S8, I to want execute my query / stored procedure when there are 3 new samples.
So my s1,s2,s3 become my old samples and S8,S9,S10 are my new samples and then calculate the mean shift.

The whole purpose of my task is to flag the feature when there is a high mean shift. I dont want to calculate the mean shift every time there is a new sample....i want to calculate the mean shift when there are 3 new samples and flag the feature.

should i try something like:
declare

@last_sample_date datetime
@sample_count int

set @sample_count =
set @last_sample_date = (SELECT MAX(Sample_Date) FROM table)

IF condition
BEGIN

Select.....
......
I am not sure.

ScottPletcher · May 6, 2020, 3:59pm

You should have mentioned that to start with, that's a very significant difference. Good luck with this.

yosiasz · May 6, 2020, 4:21pm

you want to compare the latest 1 sample (3 rows) with the previous 3 samples (9 rows) Is that correct?

Compare latest 30 rows with previous 30 rows - *Urgent Help*

Compare latest 30 rows with previous 30 rows - Urgent Help