Non Distinct Blues

incognito · September 11, 2018, 5:32pm

Hello you wonderful people,

I'm having a hard time identifying some Non Distinct records. I have a distinct query that returns 19,992 records from a table of 20,000. I have to identify those 8 records so that they can be inspected.

My query is "SELECT DISTINCT Model, Ref, Amount FROM tblBig"

the data is like the following:

Model Ref Amount
s y689 88.50
q abcd 77.40
a x555 100.00
a x555 100.00
t 9999 44.50
x a877 55.00
z b999 44.12

With this data example the distinct query will return 6 records out of the 7 in the table.

I need to fish out the "a, x555, 100.00" record. Would someone please assist me?

ScottPletcher · September 11, 2018, 6:35pm

SELECT Model, Ref, Amount, COUNT(*) AS Total_Rows
FROM tblBig
GROUP BY Model, Ref, Amount
HAVING COUNT(*) > 1

incognito · September 12, 2018, 10:49am

Thank you, thank you and more thank yous!!! The query works, but man, I've stared at it, I've analyzed it and i just don't get. I don't understand it, or understand why it works. Looks like there's some detail in the code that needs to be broken out and that this shouldn't work. Would it be too wordy or long to explain to me why this works?

ScottPletcher · September 12, 2018, 3:04pm

Nah, shouldn't take too long (I hope).

The key is the "GROUP BY" clause. That "tells" SQL that this query will only be concerned with aggregate rows ("multiple rows combined into a single row in some way").

It's exactly like a DISTINCT except that it also allows you to get aggregate values from other columns.

So, for a GROUP BY, there will be only 1 row per unique combination of all the GROUP BY column(s).

The COUNT(*) tells SQL to count how many rows matched that group. So if I GROUPed your original data on just Model, "a" would have a count of 2, the others a count of 1, since Model "a" has two rows, the others just one.

You can also find the lowest and/or highest value of columns in the group, using MIN / MAX.