Why Doesn't this code do what I think it should

DaftVader · February 20, 2021, 11:11pm

I am experimenting with this code that I got from the net,
I am trying to get better at SQL so would like to understand why this does not do what it seems it should.
I am running it against a over 10million row table which I noticed had several duplicate rows .
The Code:
select * from boilerdata where ReadingDate in (
select ReadingDate from boilerdata
group by ReadingDate having count(*)=6
)

I believe This should return any duplicate Datetimes, where specifically there are 6 duplicates for a particular DateTime Grouped together.
BTW the =6 is just an example I have tried >n or =n with several numbers but all give me similar unexpected results.
The Table consists of 2 rows added every minute 24/7 so there are always 2 of any datetime anyway,
(I don't think this is good so I am thinking of just changing that to one row per
minute ). But I digress.
So I could go on about all the results I have had but mainly it works better if I use an = rather than a >
So with this example I got several groups of 6 + several groups of 4 + about 30 groups of 2 at the end.
I can't explain this at all ! I only asked for 6 anyway but got the 4's as well? that's hard enough to explain but why the 30 or so groups of 2 ? there are millions of groups of 2 in the table!
I would appreciate any help , hopefully I might learn something.

Thanks

ScottPletcher · February 21, 2021, 4:44am

Usable sample data -- create table and insert statement(s) -- would be much easier to understand so that we can help you by providing actual code.

yosiasz · February 21, 2021, 5:29am

What column(s) make a row duplicate for this as @ScottPletcher recommended we need to see yoir table structure and some sample data that shows us what constitutes a dup. Or give us full access to your sql server

DaftVader · February 21, 2021, 11:47am

Hi and thanks.
This is the create Table code:
USE [Sensors]
GO

/****** Object: Table [dbo].[BoilerData] Script Date: 21/02/2021 11:44:10 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[BoilerData](
[ID] [int] NOT NULL,
[Temperature] [decimal](5, 2) NOT NULL,
[BoilerState] [int] NOT NULL,
[ReadingDate] [datetime] NOT NULL,
[LatchValue] [int] NULL,
[Room] nvarchar NULL,

CONSTRAINT [PK_BoilerData_1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

DaftVader · February 21, 2021, 11:52am

This is the first 4 rows of the Table hope this is uploaded OK!
DB sample

DaftVader · February 21, 2021, 11:53am

As you can see the ReadingDate is the column that is duplicated

yosiasz · February 21, 2021, 3:19pm

That is not a duplicate.
The other column Room, shows different data as well as other columns. So you need to provide us more clearer requirements Vader, use the force

DaftVader · February 21, 2021, 10:59pm

Ah! Perhaps I got confused, Yes the row is not a duplicate but the Datetime is.
The code I quoted is looking for duplicate Datetimes
Or that's what it looks like to me?

yosiasz · February 22, 2021, 12:16am

duplication is based not on just one column. So which row do you want to see from the dups?