Backup File Retention - a cautionary tale

We have a number of backup plans, but the most common, for a database in Full Recovery, is:

  • Full backup on a Sunday, retained for 4 weeks
  • Diff backup each night, retained for 14 days
  • TLog backup every few minutes, retained for 7 days

The backup files get immediately copied to a mirror-server somewhere else in the building, and then "a bit later on" sent to our Cloud Backup facility.

I have an SProc that makes a backup - that makes sure that the physical file is in the correct folder, with all the other backups, gets copied to the Mirror Server etc., and is added to our Admin Database so that the Housekeeping job purges it at the appropriate time - so absolutely zero chance that, come The Disaster!!, there is one missing backup that some TWIT took and stored in gawd-knows-what-folder ...

That Sproc has parameters for DB Name and Backup Type (Full, Diff, TLog) and also optional parameters for a Description and Retention days.

So ... yesterday (i.e. a Tuesday) I wanted to restore Production to QA so the Users could try out some stuff. The Users said "Last Thursday around 2PM" would be a good starting point. No problem, we have TLog backups for 7 days ...

So I figured out the appropriate Full (two Sundays ago), and the Diff (last Thursday 1AM) and the all the TLog files since that Diff up to Thursday 2PM. (I have an SProc that does all that and generates a script for me ...)

When it came to restore the DIFF I got the error message that the LSNs were not contiguous. Blast!

Checked the MSDB Backup History and Yup! another Full backup was made last Wednesday. Hmmm ... why was that not selected in my script? Check the disk - file not there. Check our Admin Database history and the Wednesday backup was made with a SEVEN day retention. Then the penny dropped - the Backup SProc has a default Retention Days = 7 - which seemed like it would be SPANNED by any routine backups.

Double-Blast!

Of course I could get the file back from Cloud, but that's sufficient hassle that I didn't bother ... we just used a later restore version for our QA test ...

... but I have changed the Backup Sproc so that the default retention time is 14 days for Full, and 7 days for Diff and TLog.

I've been promising myself that the Housekeeping would only remove files once both the Retention Date had passed AND any Child Backups had been deleted - so a DIFF would not be deleted until were no TLog backups that depended on it, and a FULL would not be deleted if any DIFFs depended on that ... that would prevent this issue, but the chances of me getting around to doing that are "slim".

I could add a COPY_ONLY option to the Backup Sproc, but I've never felt the need for that in my line of work - I certainly use it when doing an adhoc backup on someone else's server. When I take a FULL backup, before trying something "potentially dangerous", it seems to me that its fine that that backup file just forms part of the normal backup set ... but the files do need to hang around long enough to be useful!

Is there a chance that a Dev tried to beat you to the punch and created a backup themselves, not knowing that it would make the previous logs useless?

It was me that took that backup :frowning: I just left the (optional) RetentionDays at default value ...

... but its a good-enough suggestion for a gag around April Fools day ...!

:joy: