Database LOG Backup Corrupt but not found by CHECKDB

Kristen · October 11, 2016, 6:06pm

I've got this error in SQL Error Log

Backup detected log corruption in database MyDatabase. 
Context is FirstSector. 
LogFile: 2 'H:\MSSQL\Log\MyDatabase.ldf' 
VLF SeqNo: x381 
VLFBase: x100000 
LogBlockOffset: x99ce00 
SectorStatus: 2 
LogBlock.StartLsn.
SeqNo: x63278c 
LogBlock.StartLsn.Blk: xa650 
Size: x2 
PrevSize: x0

This command:

USE [MyDatabase]
GO
DBCC CHECKDB WITH NO_INFOMSGS, EXTENDED_LOGICAL_CHECKS, DATA_PURITY	-- , ALL_ERRORMSGS

shows no error Does CHECKDB not perform any checks on Log file? Is there some other way to check the Log files?

I found the first occurrence in the SQL Error Logs. This was the last successful backup of that database:

Date		20/09/2016 21:01:00
Log		SQL Server (Current - 30/08/2016 08:09:00)
Source		Backup
Message
Log was backed up. Database: MyDatabase, 
creation date(time): 2013/04/04(17:18:53), 
first LSN: 897:17466:1, 
last LSN: 897:17593:1, 
number of dump devices: 1, 
device information: (FILE=1, TYPE=DISK, 
    MEDIANAME='MyDatabase_Dump20160920': 
    {'F:\MSSQL\BACKUP\MyDatabase_20160920_210100_Trans.BAK'}). 
This is an informational message only. No user action is required.

and then on the next hour a Log backup on another database works fine, but that database gives this error:

Date		20/09/2016 22:03:39
Log		SQL Server (Current - 30/08/2016 08:09:00)
Source		spid54
Message
Backup detected log corruption in database MyDatabase. 
Context is FirstSector. 
LogFile: 2 'H:\MSSQL\Log\MyDatabase.ldf' 
VLF SeqNo: x381 
VLFBase: x100000 
LogBlockOffset: x99ce00 
SectorStatus: 2 
LogBlock.StartLsn.SeqNo: x63278c 
LogBlock.StartLsn.Blk: xa650 
Size: x2 
PrevSize: x0

I presume I can throw away the log, in some way, and just carry on (as the MDF file does not appear to be damaged). I'm not sure ho to do that though ...

jeffw8713 · October 11, 2016, 6:16pm

Shrink the log file to the smallest size possible then regrow it to its normal operating size. You really don't want to detach the database and then attach it again without a log file - but you can do that if resizing the log doesn't work.

Kristen · October 11, 2016, 7:00pm

Tried a Log Backup - unsurprisingly that failed.

however, a Full backup was successful. I'm a bit surprised as I assumed it would append the LOG file to the FULL backup, but perhaps there wasn't anything "active" in the damaged area of the file.

Perhaps I could just restore that FULL Backup back over the corrupted database (log file)?

I'll try restoring it to a TEMP database and seeing if it looks like it is OK.

Presumably if I rename the damaged database ("belt and braces" in case I need to access it) that would also carry with it all the Jobs etc. associated with that DB, so not really that good an idea to rename it ...

... perhaps I should stop the SQL service and do a raw copy of the MDF / LDF files as a "belt and braces" (after checking that my backup, restored to a TEMP database, seems to be OK)

yosiasz · October 11, 2016, 7:02pm

what environment are you restoring this to?
Do you really need to restore the log if restoring to dev/qa environs
What are you attempting to do here, critical or not?

Kristen · October 11, 2016, 7:08pm

Restored TEMP looks OK. CHECKDB is fine, and I managed to make a LOG backup from it (although presumably nothing to actually back up, as I assume having restored it the LOG would be checkpointed etc. at that point and "empty"?)

So I'm inclined to just restore over the top of the original - although I would like to stop SQL Service and take a low-level file copy first, just in case I need to examine it further - e.g. in case it helps to figure out what caused the problem

Kristen · October 11, 2016, 7:19pm

Production.

The damaged log file is on an "ADMIN" database which, ironically, is the thing that controls all our backups!! I have manually run LOG and FULL backups on all databases. Apart from the LOG on this one everything else was fine. CHECKDB was clean on all DBs (NICLUDING the one with the damaged LOG file )

No, I can live without the LOG on this DB (broken log chain is OK)

I would prefer not to lose data, but if I had to restore this DB back a month (say) that would be OK. It will just resume recording data about backups and other Admin stuff, and I'll just have a gap in my data. There is no user-entered data in the database (to lose), and I am certain that no other DBA will have had reason to change any parameters / whatever in that database over the last several months, let alone the last few weeks, hence "restore from last known good backup" would be fine. (Famous Last Words on that one .. of course!! ... but, either way, a gap in the data would not be critical)

However, if this had happened to the (Production) Client Databases on this server that would be a very different story, so it would be good to find the cause.

There is nothing in the Event Log such as "Disk / I/O / Controller failure" etc. all network attached storage is completely outside my control; we just supply an AP to the client, I don't even have a point of contact with their outsources IT people (and, in fact, I shouldn't be having to bother with database backups etc. at all, their IT people should be doing that, but noone ever did so I installed my own Admin Stuff)

Presumably if Network Storage failed I wouldn't see anything in the Event log on this server? and it would be down to the client's IT people to look at whatever logs are available for that Storage Device?

Looking at the Event Log:

System:
20/09/2016 21:09:27 Time Service EventID=50

"The time service detected a time difference of greater than 5000 milliseconds for 900 seconds. The time difference might be caused by synchronization with low-accuracy time sources or by suboptimal network conditions. The time service is no longer synchronized and cannot provide the time to other clients or update the system clock. When a valid time stamp is received from a time service provider, the time service will correct itself."

20/09/2016 22:00:50 Kernel-General EventID=1

"The system time has changed to ‎2016‎-‎09‎-‎20T21:00:50.727000000Z from ‎2016‎-‎09‎-‎20T20:59:07.520669300Z."

These messages are not uncommon. It annoys me that the client has not solved this (I've raised it often) as I suspect that a time change of nearly 2 minutes screws up stuff in our APP ... but I can't see it corrupting a Log File. (Thus I just mention it because it is the only tangible thing in the log files at that time)

The Event Application Log shows the successful backups (all databases) at 21:00 then the next thing is the successful backup at 22:00 (of a different database) followed by the backup failure (of my Admin database)

Kristen · October 11, 2016, 7:39pm

Paul Randall has some insight and suggestions

sqlmag.com/blog/transaction-log-corruption-and-backups

"DBCC CHECKDB doesn’t analyze the log at all"

"Unless the corruption occurs in the active portion of the log (the portion that is required for some reason by SQL Server) then its likely that no-one will ever know it occurred!"

"The active portion of the log will also be used if a transaction rolls back, if crash recovery has to run, or when a transaction log backup occurs (plus a bunch of other uses like replication and mirroring – the aforementioned article has more information). If a damaged log record is encountered during one of these uses then an error will be thrown and the operation will fail."

"The only time a database will be marked SUSPECT because of a corruption in the transaction log is if the corruption is encountered during crash recovery or during a transaction rollback"

"If a corrupt log record is encountered during a transaction log backup, the backup will fail – but that’s all."

Work arounds:

"Switching the database to the Simple recovery model
Performing a checkpoint (which should clear the active log as long as nothing else requires the log to be kept active)
Switching back to the Full recovery model
Reestablishing the log backup chain by performing a full or differential backup"

"Of course, as in any corruption situation, make sure to do some root-cause analysis to figure out why the log became corrupt in the first place."

yosiasz · October 11, 2016, 9:53pm

you still want the link to chimplants?
I feel your pain. Are there any other "enhanced torture methodologies" you can apply to the files themselves to see if you can extract any valuable stuff from out of it. Xfiles episode 29

I would start planning for plan B until you find what the root cause was. Plan B being extract data to xml, json, csv low level file system. Time differences are a common thing I would say. Unless it is on an Domain Controller, I would not think it could cause issues

Are they using virtual machines?
What kind of virtualization
Have you tested all the other backups, yikes!
Are they having storage issues? corruption etc

Kristen · October 12, 2016, 7:21am

Yes
No idea !
No
Seems like it to me ...

I'll pass all this over to their IT people, its a good opportunity.

Turns out they are already making a (direct from SQL) backup of some sort at night. So that is one full backup CHECK and one log backup ... per day ... breaking the log backup chain that we are making. Client is welcome to them!

Kristen · October 12, 2016, 8:27am

Done this. It worked. Here's the steps I used in case it helps anyone else:


-- Globally Find&Replace MyDatabaseName with the name of the datbase with corrupted LOG file
--
-- Fix corrupted LOG file
-- As per: Paul Randall		sqlmag.com/blog/transaction-log-corruption-and-backups
-- Use this ONLY IF:
--	Database is in FULL Recovery Model
--		SELECT D.recovery_model_desc, D.name FROM master.sys.databases AS D WHERE D.name = DB_Name()
--
--	Log Backup fails?
--		Make a LOG Backup to check that it fails with error
--
-- Optionally take a copy of the corrupted database:
--
--	Takea FULL backup of the database (in case SQL Service will not restart)
--	STOP the SQL Service
--	Copy the MDF/NDF and LDF file(s) to a safe place
--	ReSTART the SQL Service (and SQL Agent etc.)
--
-- Change MyDatabaseName Recovery Model to SIMPLE
USE master
GO
ALTER DATABASE MyDatabaseName
SET RECOVERY SIMPLE
GO
--

-- Perform a checkpoint (which should clear the active log as long as nothing else requires the log to be kept active)
USE MyDatabaseName
GO
CHECKPOINT
GO
--
-- Switch back to the Full recovery model
USE master
GO
ALTER DATABASE MyDatabaseName
SET RECOVERY FULL
GO
--
-- Reestablish the log backup chain by performing a full or differential backup
--	Make a FULL or DIFFERENTIAL backup on MyDatabaseName
GO
--
-- Test a LOG Backup
--	Make a LOG backup on MyDatabaseName to test that there is no error
GO
--
-- Finally, go find the problem that caused the corruption!