Query help - updating records in a table that has not been normalised

I have a single table (derived from importing the content of 1000's of music files) in a non normalised database. The table contains the attributes of multiple albums, incl:

  • __dirpath (path from which data was pulled ... unique to each album)
  • foldername (folder in which tracks representing album resides)
  • discnumber (can be blank or numeric, representing discnumber from which track is taken)
  • track (numeric representing track number in album, sequential from 1..n where n is number of tracks)
  • album (name of album)

For every track making up an album there is an entry in the table. The metadata from which the table was created is inconsistent in that:

  • some albums that are single disc albums have a discnumber of "1" only, others that are single disc albums may be blank

I'd like to run an update query on the table to delete all instances of discnumber where for a given dirpath all tracks have a discnumber set to "1"

This code shows all instances where __dirpath has tracks with a discnumber entry. If I can then narrow that result down to those where __dirpath only has a discnumber of '1' I can run an update statement to remove the discnumber from all records with a matching __dirpath. But I'm stuck as I don't regularly work in SQL.

SELECT DISTINCT __dirpath, discnumber FROM audio WHERE discnumber IS NOT NULL ORDER BY __dirpath, discnumber;

Any ideas how to go about this?

If I'm understanding you correctly, you want to delete (stop saying "update") entries that only have one track per dirpath. This is the same as saying you want to delete all entries that DON'T have a second track:delete a -- select * -- Highlight and Run this SELECT statement to preview what data will be affected from Audio a where not exists ( select * from aTable a2 where a.[__dirpath] = a2.[__dirpath] and a2.discnumber <> 1 )If you run the SELECT statement first, you'll see which records will be involved in the DELETE. From the output you can verify that these are indeed the data you want operated on.

@stephen_hendricks

Thanks, but I think I was not clear in my first post. I'll use actual data as an example:

The first album has more than one discnumber associated with the album, the second album only has discnumber set to 1.

What I need to do is to replace all instances of discnumber with NULL where ALL records associated with an album only have a value of discnumber = 1 and no other value. As an album name is not necessarily unique we can substitute __dirname which is unique to each album. I do not want to delete any records.

In the sample provided all records associated with __dirpath = "/mnt/alibbtrfs/vol1/B/Buena Vista Social Club - Realidad en Vivo" would have discnumber set to NULL. Similarly any other albums meeting the same crtieria would also have disnumber set to NULL.

Only a slight modification to the same basic logicupdate a set disknumber = null -- select * -- Highlight and Run this SELECT statement to preview what data will be affected from Audio a where not exists ( select * from aTable a2 where a.[__dirpath] = a2.[__dirpath] and a2.discnumber <> 1 )You should still run the SELECT to validate the recordset that will be operated on.

Thanks @stephen_hendricks. I translated your code as follows and ran it in sqlite:

SELECT __dirpath, discnumber, track, title FROM audio a WHERE NOT EXISTS ( SELECT * FROM audio a2 WHERE a.__dirpath = a2.__dirpath AND a2.discnumber <> 1 ) ORDER BY __dirpath, discnumber, track;

It's either in a loop or it's taking a hell of a lot of time to return a result. I presume my references to audio as alias a and a2 is what you intended?

Could you recommend a great resource for learning more re going beyond the basic select statement?

Here are a couple of authors who write on SQL coding:

  • Itzik Ben Gan
  • Ken Henderson (rip)

I can't recommend a specific book but find an intro book from either and you'll be headed in the right direction.

It sounds like this should perform better for your particular table and data:

UPDATE a
SET discnumber = NULL
FROM audio a
INNER JOIN (
    SELECT __dirpath
    FROM audio
    WHERE discnumber IS NOT NULL
    GROUP BY discnumber
    HAVING COUNT(CASE WHEN discnumber = 1 THEN 1 ELSE 0 END) = COUNT(*)
) AS a_discnumber_all_1 ON a_discnumber_all_1.__dirpath = a.__dirpath