Why does CHECKDB succeed but the backups terminate suggesting corruption

Following a SAN issue a week ago a single DB was marked Suspect. CHECKDB showed extensive inconsistencies resulting in performing a recovery from valid .bak & .trn files. Following the restore, CHECKDB across all system and user dbs was good and the application was re-enabled.

However since then the daily backup of this one database has continually failed, and yet CHECKDB is always happy across all the databases. The backup produces the same error each time, always getting to 70% before terminating. I’ve also tried adding continue_after_error but still without success

Msg 3203, Level 16,State 1, Line 3 Read on "mydb.mdf" failed: 
23(failed to retrieve text for this error.Reason 15105)
Msg 3203, Level 16, State 1, Line 3
BACKUP DATABASE is terminating abnormally.

I also see in the System Event log

The device,\Device\Harddisk2\DR2, has a bad block

The environment is VMWare - Windows Server 2008 R2 SP1 running SQL 2008R2 SP3, Standard Edition and an 11GB database which is separated across 3 files – Primary, Data, Index.

The server was restarted a couple of days ago in an attempt to fix the backup issue. SQL Server restarted OK and reported no errors in either the application or system event logs - that is until a backup of this database is attempted and the errors above occur. Log Backups are succeeding but there is no valid backup-chain to recover from now. This position is serious without a valid backup now for over a week.

Other forums suggest this is caused by file permissions or disk space for the backup to finish but this is not the case. I’ve considered detaching/stopping SQL in order to copy the mdf, ndf, ldf files to another server, but I don't want to stop the system again until I feel more confident with the data being protected. I’m pulling together a process to export all the DB objects and bulk copy all the data out into another database.

If the problem is with the Primary file (mdf) why is the CHECKDB not picking it up? Would welcome any thoughts, further checks I could run to help identify what is wrong with this file and any workarounds to get the backups working.

Thanks Unsy.

take a look at this link hope this will help you out: http://www.sqlrecoverysoftware.net/blog/sql-error-15105.html

Thanks Jason

Just to say that we have concluded the mdf file is beyond repair. To share the scenario again

With the bad sector in the mdf file

  • the T-Log backups succeeded
  • the Database was still accessible/functioning
  • and CHECKDB were appearing good

However

  • Full & Diff backups failed
  • the MDF file could not be copied when the DB was detached
  • the DB could still be be reattached in situ

Due to some poor file management & the delayed identification of this whole issue

the log chain become broken (due to limited log backups retention)
the only solution was to restoring an old backup and painful copy out of the data

No copies of old files on tape etc?

You might be able to salvage all the data from "actual" tables:

INSERT INTO NewTempDB.dbo.MyTable
SELECT * FROM LiveDB.dbo.MyTable

this should use clustered index, rather than secondary index, and hopefully!! the bad block is occupied by an Index page. If it is occupied by a Data page it should be possible to export "up to" the bad data and resume again after that point (leaving a suitable gap, but at least you will know WHAT that gap is)

Alternative (if space is tight) would be toe BCP the data out (using NATIVE format) so that it can be re-imported later / onto a different / remote machine. Same principle of exporting using Clustered Index, and skipping Bad Block areas, would apply.

Personally I would never continue on suspect hardware (e.g; after a clean restore following some sort of corruption) without finding the cause