To make a really long story short, a bit of code to update ~1000 customers, which had worked quite nicely in dev and test, did something odd when the code was deployed to production. It turned out to be a fault in SQL server that they came out with an emergency hot fix for.
The issue was that if parallelism occurred because of a large amount of data with "just the right indexes in place", SQL Server would ignore a WHERE clause that did an ISNULL comparison.
The code had a WHERE ISNULL() condition on the 1.2 milllion row customer table that should have only updated about 1000 customers. Instead, the code updated ALL the customers. Because the DBA didn't know how many rows were supposed to be updated, when it came back with a rowcount of 1.2 million, he committed the transaction.
It wasn't until the next morning after all the nightly snapshots had been made that anyone noticed the problem. They shut off public access to the database and tried to do a restore only to find there were no backups available for the last month.
As a bit of a sidebar, that incident proved that I was correct about deployment code. It MUST have hardcoded and displayed at run time expected row counts for every action taken in the code so that the DBA can make the final decision to either commit or rollback.
And, I have to tell you, there isn't an automatic deployment system in the world that would have caught this issue because we checked and a SELECT to determine the row count made the same error.
That was way back on SQL Server 2000. I've seen lot's more be wrong in SQL Server since so this type of problem is never going to go away. You MUST write your deployment code to be bomb proof and verifiable by some human means. If you don't, you end up with the mouse guarding the cheese and, MAN, can that mouse eat some cheese!
As for why the backups didn't happen, I wasn't made privy to that detail. It's almost not an important detail. What's important is that no one knew that the backups weren't actually happening for a whole bloody month. There wouldn't have been an issue if someone had checked once per day or read a system report for what the status of the badckups were. It was an absolutely stupid blunder on the part of the DBA and the Infrastructure Team.