SQLTeam.com | Weblogs | Forums

Why Doesn't this code do what I think it should

I am experimenting with this code that I got from the net,
I am trying to get better at SQL so would like to understand why this does not do what it seems it should.
I am running it against a over 10million row table which I noticed had several duplicate rows .
The Code:
select * from boilerdata where ReadingDate in (
select ReadingDate from boilerdata
group by ReadingDate having count(*)=6
)

I believe This should return any duplicate Datetimes, where specifically there are 6 duplicates for a particular DateTime Grouped together.
BTW the =6 is just an example I have tried >n or =n with several numbers but all give me similar unexpected results.
The Table consists of 2 rows added every minute 24/7 so there are always 2 of any datetime anyway,
(I don't think this is good so I am thinking of just changing that to one row per
minute ). But I digress.
So I could go on about all the results I have had but mainly it works better if I use an = rather than a >
So with this example I got several groups of 6 + several groups of 4 + about 30 groups of 2 at the end.
I can't explain this at all ! I only asked for 6 anyway but got the 4's as well? that's hard enough to explain but why the 30 or so groups of 2 ? there are millions of groups of 2 in the table!
I would appreciate any help , hopefully I might learn something.

Thanks

Usable sample data -- create table and insert statement(s) -- would be much easier to understand so that we can help you by providing actual code.

What column(s) make a row duplicate for this as @ScottPletcher recommended we need to see yoir table structure and some sample data that shows us what constitutes a dup. Or give us full access to your sql server

Hi and thanks.
This is the create Table code:
USE [Sensors]
GO

/****** Object: Table [dbo].[BoilerData] Script Date: 21/02/2021 11:44:10 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[BoilerData](
[ID] [int] NOT NULL,
[Temperature] [decimal](5, 2) NOT NULL,
[BoilerState] [int] NOT NULL,
[ReadingDate] [datetime] NOT NULL,
[LatchValue] [int] NULL,
[Room] nvarchar NULL,

CONSTRAINT [PK_BoilerData_1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

This is the first 4 rows of the Table hope this is uploaded OK! :slight_smile:
DB sample

As you can see the ReadingDate is the column that is duplicated

That is not a duplicate.
The other column Room, shows different data as well as other columns. So you need to provide us more clearer requirements Vader, use the force

Ah! Perhaps I got confused, Yes the row is not a duplicate but the Datetime is.
The code I quoted is looking for duplicate Datetimes
Or that's what it looks like to me?

duplication is based not on just one column. So which row do you want to see from the dups?