Why does CHECKDB succeed but the backups terminate suggesting corruption

Unsy · July 30, 2016, 4:53pm

Following a SAN issue a week ago a single DB was marked Suspect. CHECKDB showed extensive inconsistencies resulting in performing a recovery from valid .bak & .trn files. Following the restore, CHECKDB across all system and user dbs was good and the application was re-enabled.

However since then the daily backup of this one database has continually failed, and yet CHECKDB is always happy across all the databases. The backup produces the same error each time, always getting to 70% before terminating. I’ve also tried adding continue_after_error but still without success

Msg 3203, Level 16,State 1, Line 3 Read on "mydb.mdf" failed: 
23(failed to retrieve text for this error.Reason 15105)
Msg 3203, Level 16, State 1, Line 3
BACKUP DATABASE is terminating abnormally.

I also see in the System Event log

The device,\Device\Harddisk2\DR2, has a bad block

The environment is VMWare - Windows Server 2008 R2 SP1 running SQL 2008R2 SP3, Standard Edition and an 11GB database which is separated across 3 files – Primary, Data, Index.

The server was restarted a couple of days ago in an attempt to fix the backup issue. SQL Server restarted OK and reported no errors in either the application or system event logs - that is until a backup of this database is attempted and the errors above occur. Log Backups are succeeding but there is no valid backup-chain to recover from now. This position is serious without a valid backup now for over a week.

Other forums suggest this is caused by file permissions or disk space for the backup to finish but this is not the case. I’ve considered detaching/stopping SQL in order to copy the mdf, ndf, ldf files to another server, but I don't want to stop the system again until I feel more confident with the data being protected. I’m pulling together a process to export all the DB objects and bulk copy all the data out into another database.

If the problem is with the Primary file (mdf) why is the CHECKDB not picking it up? Would welcome any thoughts, further checks I could run to help identify what is wrong with this file and any workarounds to get the backups working.

Thanks Unsy.

jason_clark · August 1, 2016, 5:07am

take a look at this link hope this will help you out: http://www.sqlrecoverysoftware.net/blog/sql-error-15105.html

Unsy · August 9, 2016, 9:38am

Thanks Jason

Just to say that we have concluded the mdf file is beyond repair. To share the scenario again

With the bad sector in the mdf file

the T-Log backups succeeded
the Database was still accessible/functioning
and CHECKDB were appearing good

However

Full & Diff backups failed
the MDF file could not be copied when the DB was detached
the DB could still be be reattached in situ

Due to some poor file management & the delayed identification of this whole issue

the log chain become broken (due to limited log backups retention)
the only solution was to restoring an old backup and painful copy out of the data

Kristen · August 9, 2016, 3:36pm

No copies of old files on tape etc?

You might be able to salvage all the data from "actual" tables:

INSERT INTO NewTempDB.dbo.MyTable
SELECT * FROM LiveDB.dbo.MyTable

this should use clustered index, rather than secondary index, and hopefully!! the bad block is occupied by an Index page. If it is occupied by a Data page it should be possible to export "up to" the bad data and resume again after that point (leaving a suitable gap, but at least you will know WHAT that gap is)

Alternative (if space is tight) would be toe BCP the data out (using NATIVE format) so that it can be re-imported later / onto a different / remote machine. Same principle of exporting using Clustered Index, and skipping Bad Block areas, would apply.

Personally I would never continue on suspect hardware (e.g; after a clean restore following some sort of corruption) without finding the cause