Compare latest 30 rows with previous 30 rows - *Urgent Help*

HI,

I am working on a query to compare the mean shift for values between the latest 30 records and the next set of 30 records.
I am using row_number() over to get all the 60 records for my comparison.
But I do not want to run this every time a new record is inserted into the database/table.
I only want to run this query when 30 new records have entered the database so I can compare all new latest 30 records with the old 30 records.

I am not sure how to achieve this. Please help.

Thanks in advance!

Please provide sample data

hi
hope this helps :slight_smile:

select
case when row number mod 30 = 0 then compare latest 30 to previous 30
else dont do anything !!

are all the 30 records going to be inserted at once !! ??

1 Like

thanks for the response.
No all 30 records will not be entered at once.
once there are a new set of 30 records i want to do my comparison.

Thanks

Provide sample data. Help us help you

hi

then you will have to mark the records ..
as they come in

once you reach 30 .. do the calculation .. then start marking the next 30 records
reset the old records ..

how will the records be inserted ??
one at a time
OR
let say you have 25 records .. then 10 records are inserted
which takes it to 35 .. ..how would you handle this ??

hi yosiasz

Samples Sample datetime Type Value
S1 05/2/2020 12:50AM F1 21
S1 05/2/2020 12:58AM F2 20
S1 05/2/2020 1:15AM F3 19
S2 05/2/2020 1:25AM F1 18
S2 05/2/2020 1:35AM F2 19
S2 05/2/2020 1:50AM F3 20
S3 05/2/2020 2:00AM F1 25
S3 05/2/2020 2:20AM F2 16
S3 05/2/2020 2:35AM F3 16

This is how my data looks.
Daily lets say we get 1000 distinct samples.
I want to calculate the delta mean for each Type (F1,F2,F3) between the latest 30 and next 30 .
Now lets say we have 60 samples and I calculated my delta mean and a new sample has entered the table. I don't want my query to calculate delta mean with this one new sample...my query should be executed only when 30 new samples have entered the database.
Hoping this explanation is clear.

Thank you.

harish,
the records are inserted one at a time, usually with time gap of 10 to 20 minutes

Hi

How about a trigger on the table .. for insert ...

Add a new column to table for calculation .!!!!
Add a new column ( ROW COUNT ) to table which keeps track of rowcount

when a new row inserted
check if it is 30th
if it is 30th row inserted ...

then do the calculation ( latest 30 to previous 30 )

update the column ( new column created for calculation )
reset the rowcount column to 0 for all records !!!

again start with new row inserted

hi

i tried to come up with something ..
old record are marked as old .. new records are marked as new
this is just s starting point ... to the solution !!

i am trying to do for 6 records at a time ( F1, F2 , F3 included )

please click arrow to the left for drop create sample data

drop create sample data ..
drop table #Samples 
go 

create table #Samples 
(
Samples	varchar(100),
Sample datetime	,
Type  varchar(2) ,
Value int ,
What varchar(10)
)
go 

insert into #Samples select 'S1','05/2/2020 12:50','F1',21,'Old'
insert into #Samples select 'S1','05/2/2020 12:58','F2',20,'Old'
insert into #Samples select 'S1','05/2/2020 1:15' ,'F3',19,'Old'

insert into #Samples select 'S2','05/2/2020 1:25' ,'F1',18,'Old'
insert into #Samples select 'S2','05/2/2020 1:35' ,'F2',19,'Old'
insert into #Samples select 'S2','05/2/2020 1:50' ,'F3',20,'Old'

insert into #Samples select 'S3','05/2/2020 2:00' ,'F1',25,'New'
insert into #Samples select 'S3','05/2/2020 2:20' ,'F2',16,'New'
insert into #Samples select 'S3','05/2/2020 2:35' ,'F3',16,'New'

insert into #Samples select 'S4','05/2/2020 2:40','F1',24,'New'
insert into #Samples select 'S4','05/2/2020 2:50','F2',21,'New'
insert into #Samples select 'S4','05/2/2020 2:55','F3',13,'New

image

; with cte as 
(
	select type,sum(value) as sumval,what 
	from #Samples
	group by type,what
)
select a.Type,a.sumval,b.sumval,b.sumval-a.sumval as Delta 
	from 
	(select * from cte where What ='New' ) a
		join 
	(select * from cte where What ='Old' ) b 
	   on a.Type =b.Type

image

hi

New records being inserted will have status = 'In Progress'

Once 30 records are inserted ..

Update the table           ... rows with status = 'Old'  to status = 'Done'
                           ... rows with status = 'New' to status = 'Old'
                           ... rows with status = 'In Progress' to status = 'New'

Now run the SQL Query I gave above ..

; with cte as 
(
	select type,sum(value) as sumval,what 
	from #Samples
	group by type,what
)
select a.Type,a.sumval,b.sumval,b.sumval-a.sumval as Delta 
	from 
	(select * from cte where What ='New' ) a
		join 
	(select * from cte where What ='Old' ) b 
	   on a.Type =b.Type

Hi Harish,

Thanks a ton for your effort. I will try it out and let you know how it goes.

Thanks again!

sample_data

drop table #samples
go

create table #samples
(
sample varchar(100) NOT NULL,
sample_datetime datetime NOT NULL,
type char(2) NOT NULL,
value decimal(9, 2) NOT NULL,
delta_mean_status tinyint NOT NULL DEFAULT 0,
delta_mean decimal(9, 2) NULL
)
go

insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S1','05/2/2020 12:50','F1',21,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S1','05/2/2020 12:58','F2',20,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S1','05/2/2020 1:15' ,'F3',19,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S2','05/2/2020 1:25' ,'F1',18,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S2','05/2/2020 1:35' ,'F2',19,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S2','05/2/2020 1:50' ,'F3',20,0
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S3','05/2/2020 2:00' ,'F1',25,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S3','05/2/2020 2:20' ,'F2',16,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S3','05/2/2020 2:35' ,'F3',16,1

insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S4','05/2/2020 2:40','F1',24,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S4','05/2/2020 2:50','F2',21,1
insert into #samples (sample, sample_datetime, type, value, delta_mean_status) select 'S4','05/2/2020 2:55','F3',13,1

/* create this index to vastly improve efficiency of the 30-row lookups */
CREATE UNIQUE NONCLUSTERED INDEX samples__IX_delta_mean_status ON #samples ( delta_mean_status, type, sample_datetime ) WHERE delta_mean_status = 1 WITH ( DATA_COMPRESSION = ROW, FILLFACTOR = 99, SORT_IN_TEMPDB = ON ) ON [PRIMARY];

SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO
CREATE TRIGGER dbo.samples__TR_INSERT
ON dbo.samples
AFTER INSERT
AS
SET NOCOUNT ON;
WHILE (SELECT COUNT(*) FROM dbo.samples WHERE type = 'F1' AND delta_mean_status = 1) >= 30
BEGIN
    UPDATE s
    SET delta_mean_status = 0,
        delta_mean = SUM(total_value) / value_count
    FROM (
        SELECT SUM(value) AS total_value, COUNT(value) AS value_count
        FROM (
            SELECT TOP (30) *
            FROM dbo.samples s
            WHERE delta_mean_status = 1
            ORDER BY sample_datetime
        ) AS s_first_30
    ) AS s_F1
    INNER JOIN dbo.samples s ON s.type = s_F1.type AND s.sample_datetime = s_F1.sample_datetime
    IF @@ROWCOUNT = 0
        BREAK;
END /*IF*/
/* !! Repeat the code above for F2 and F3 !! */
/*end of trigger*/
GO
1 Like

Hi Scott,
I have around 1000 features, in the example above I have only provided 3 (F1,F2,F3).

do these have a uniqueID or would the combination of the other columns constitute the unique row?

sample name is unique(S1,S2,S3)

Maybe my explanation was not clear before. I am so sorry for that.
Hopefully this will be clear.

Lets say i want to compare latest 3 samples with previous 3 samples. Sample_date is the time the sample was created , sample is the unique identifier
so here i got the latest 6 samples by partitioning on sample and ordering by date is descending order.
with this i can calculate the mean difference between the first 3 and last 3 samples for each feature

Sample Sample Date Feature Value Row_num
S1 05/4/2020 12:50PM F1 10 1
S1 05/4/2020 12:50PM F2 8 1
S1 05/4/2020 12:50PM F3 5 1
S2 05/4/2020 12:42 PM F1 9 2
S2 05/4/2020 12:42 PM F2 5 2
S2 05/4/2020 12:42 PM F3 3 2
S3 05/4/2020 12:38 PM F1 11 3
S3 05/4/2020 12:38 PM F2 7 3
S3 05/4/2020 12:38 PM F3 2 3
S4 05/4/2020 12:32 PM F1 10 4
S4 05/4/2020 12:32 PM F2 7.5 4
S4 05/4/2020 12:32 PM F3 3 4
S5 05/4/2020 12:12 PM F1 9 5
S5 05/4/2020 12:12 PM F2 7 5
S5 05/4/2020 12:12 PM F3 4 5
S6 05/4/2020 12:02 PM F1 10 6
S6 05/4/2020 12:02 PM F2 8 6
S6 05/4/2020 12:02 PM F3 7 6

A new sample S8 has entered the database.

Sample Sample Date Feature Value Row_num
S8 05/4/2020 1:20PM F1 10 1
S8 05/4/2020 1:20PM F2 8.4 1
S8 05/4/2020 1:20PM F3 8 1
S1 05/4/2020 12:50PM F1 10 2
S1 05/4/2020 12:50PM F2 8 2
S1 05/4/2020 12:50PM F3 5 2
S2 05/4/2020 12:42 PM F1 9 3
S2 05/4/2020 12:42 PM F2 5 3
S2 05/4/2020 12:42 PM F3 3 3
S3 05/4/2020 12:38 PM F1 11 4
S3 05/4/2020 12:38 PM F2 7 4
S3 05/4/2020 12:38 PM F3 2 4
S4 05/4/2020 12:32 PM F1 10 5
S4 05/4/2020 12:32 PM F2 7.5 5
S4 05/4/2020 12:32 PM F3 3 5
S5 05/4/2020 12:12 PM F1 9 6
S5 05/4/2020 12:12 PM F2 7 6
S5 05/4/2020 12:12 PM F3 4 6

So now S8 , s1, s2 are my latest 3 samples and S3,S4,S5 are my old samples and S6 wont be a part of my query as I am filtering only 6 samples.
I dont want my query/SP to execute now as there is only one new sample S8, I to want execute my query / stored procedure when there are 3 new samples.
So my s1,s2,s3 become my old samples and S8,S9,S10 are my new samples and then calculate the mean shift.

The whole purpose of my task is to flag the feature when there is a high mean shift. I dont want to calculate the mean shift every time there is a new sample....i want to calculate the mean shift when there are 3 new samples and flag the feature.

should i try something like:
declare

@last_sample_date datetime
@sample_count int

set @sample_count =
set @last_sample_date = (SELECT MAX(Sample_Date) FROM table)

IF condition
BEGIN

Select.....
......
I am not sure.

You should have mentioned that to start with, that's a very significant difference. Good luck with this.

you want to compare the latest 1 sample (3 rows) with the previous 3 samples (9 rows) Is that correct?