I am experimenting with this code that I got from the net,
I am trying to get better at SQL so would like to understand why this does not do what it seems it should.
I am running it against a over 10million row table which I noticed had several duplicate rows .
The Code:
select * from boilerdata where ReadingDate in (
select ReadingDate from boilerdata
group by ReadingDate having count(*)=6
)
I believe This should return any duplicate Datetimes, where specifically there are 6 duplicates for a particular DateTime Grouped together.
BTW the =6 is just an example I have tried >n or =n with several numbers but all give me similar unexpected results.
The Table consists of 2 rows added every minute 24/7 so there are always 2 of any datetime anyway,
(I don't think this is good so I am thinking of just changing that to one row per
minute ). But I digress.
So I could go on about all the results I have had but mainly it works better if I use an = rather than a >
So with this example I got several groups of 6 + several groups of 4 + about 30 groups of 2 at the end.
I can't explain this at all ! I only asked for 6 anyway but got the 4's as well? that's hard enough to explain but why the 30 or so groups of 2 ? there are millions of groups of 2 in the table!
I would appreciate any help , hopefully I might learn something.
What column(s) make a row duplicate for this as @ScottPletcher recommended we need to see yoir table structure and some sample data that shows us what constitutes a dup. Or give us full access to your sql server
That is not a duplicate.
The other column Room, shows different data as well as other columns. So you need to provide us more clearer requirements Vader, use the force
Ah! Perhaps I got confused, Yes the row is not a duplicate but the Datetime is.
The code I quoted is looking for duplicate Datetimes
Or that's what it looks like to me?