Restore from Backup is slow

Kristen · February 10, 2016, 3:41pm

Would appreciate any ideas on this one please.

This is a DEV database. I'm the only one working on it, but I had several connections open (from QA and probably some, not-yet-died, from Web APP connections).

I screwed up, so I took a full backup (just in case I needed it) and then set about restoring to point-in-time. I started with FULL Backup (from Sunday) and then last night's DIFF then TLogs since.

First off I did:

ALTER DATABASE MyDatabase SET OFFLINE WITH ROLLBACK IMMEDIATE

it hung. I don't think that has ever happened to me before [when using Rollback Immediate]. I left it running and tried to change each QA windows to connect to MASTER instead. No difference so I did a ROLLBACK in each window/session "just in case". None of them found anything to roll back.

I cancelled the ALTER DATABASE (which took a minute or two). I tried it again, same thing. Not entirely sure what I did next, but I think I may have done a SET SINGLE_USER WITH ROLLBACK IMMEDIATE instead or the OFFLINE command. Anyway, the next attempt with SET OFFLINE WITH ROLLBACK IMMEDIATE worked (and responded immediately).

I ran the restore from Full Backup. That went through 10%, 20% ... to 100% in reasonable time for the DB size (a couple of minutes) and then sat there at 100%

after a bit I ran an sp_WhoIsActive a few times, I got this:

(1x: 559184ms)ASYNC_IO_COMPLETION,
(1x: 588934ms)BACKUPTHREAD,
(1x: 588958ms)PREEMPTIVE_OS_WRITEFILEGATHER

Content_switches = 46,254 (not changing)
physical_io = 21,685 (not changing)
pecent_complete = 100.0000 (not changing)

Login time = 2016-02-10 14:31:24.057 (not changing)
Collection_time = 2016-02-10 14:56:38.680

I ran it again after a bit:

(1x: 740915ms)ASYNC_IO_COMPLETION,
(1x: 770666ms)BACKUPTHREAD,
(1x: 770690ms)PREEMPTIVE_OS_WRITEFILEGATHER
(the other figures had not changed)

then the Restore progressed to the "Processed 358352 pages ..." display ... and, again, just sat there, sp_WhoIsActive displayed:

(1x: 4ms)IO_COMPLETION
Content_switches = 48,331
physical_io = 23,979
Collection_time = 2016-02-10 15:02:52.970

I forgot to record the elapsed time of the restore. The FULL backup file was 492MB

I then started the DIFF Restore. That is, now, sitting there stuck on 100% - its been running 12 minutes. XXX dsplays:
(1x: 757447ms)ASYNC_IO_COMPLETION,
(1x: 757934ms)BACKUPTHREAD,
(1x: 757938ms)PREEMPTIVE_OS_WRITEFILEGATHER
Content_switches = 1,322
physical_io = 1,072
pecent_complete = 100.0000
Login time = 2016-02-10 14:31:24.057
Collection_time = 2016-02-10 15:17:11.297

The DIFF restore finished with duration 16m55s. Restore of the TLog backups was not particularly speedy, although not disastrously slow like the FULL and DIFF. Most of the TLog backups would be "empty" as no DEV activity during the night, for example! (Pretty much all the TLog backup files were 391KB, they are made every 15 minutes, I restored 78 of them and it took 9 minutes ...)

RESTORE DATABASE MyDatabase WITH RECOVERY

only took a couple of seconds

I didn't check, but I think it is unlikely that there has been any database file extension since the Sunday backup (its a DEV database, rarely any massive query processing etc.)
I don't think that the Backup files have any/many transactions included (the filesizes are too small for there to be anything significant)

When I checked Server CPU during FULL backup restore it was showing 1%-8% over the duration of the displayed history. No spikes above 8%
Memory was flat at 27.8GB (it said 3.1GB available, Task Manager suggested that SQL was using 25GB)

Microsoft SQL Server 2012 - 11.0.5058.0 (X64)
May 14 2014 18:34:29
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.3 (Build 9600: )

Kristen · February 10, 2016, 4:04pm

OK ... should have checked the ERRORLOG

2016-02-10 14:36:09.30 spid152     Setting database option OFFLINE to ON for database 'MyDatabase'.
2016-02-10 14:40:20.22 Backup      Error: 3041, Severity: 16, State: 1.
2016-02-10 14:40:20.22 Backup      BACKUP failed to complete the command BACKUP LOG MyDatabase. Check the backup application log for detailed messages.
2016-02-10 14:44:52.02 spid152     Starting up database 'MyDatabase'.
2016-02-10 14:45:32.72 spid152     Database MyDatabase has more than 1000 virtual log files which is excessive. Too many virtual log files can cause long startup and backup times. Consider shrinking the log and using a different growth increment to reduce the number of virtual log files.
2016-02-10 14:45:33.04 spid152     CHECKDB for database 'MyDatabase' finished without errors on 2015-08-22 17:54:22.570 (local time). This is an informational message only; no user action is required.
2016-02-10 14:45:33.07 spid152     Setting database option OFFLINE to ON for database 'MyDatabase'.

Looks to me like the SET OFFLINE WITH ROLLBACK IMMEDIATE was immedaitely followed by a (scheduled) BACKUP LOG for that DB and the two processes had an argument - but for 10 minutes?? and the upshot was (or because I killed it and it then rolled back) that the DB was bought back online again. However, that prompted the informational message "Database MyDatabase has more than 1000 virtual log files which is excessive. Too many virtual log files can cause long startup and backup times. Consider shrinking the log and using a different growth increment to reduce the number of virtual log files" to be displayed, which is something I was blissfully aware of ... Ho!Hum!

Looks like after the restore of both the FULL and the DIFF SQL tried to "start" the DB, but then found it couldn't as I used NO RECOVERY?? I see this message immediately prior to the "RESTORE DATABASE successfully processed ..." in the Error Log

2016-02-10 15:20:20.77 spid152     Starting up database 'MyDatabase'.
2016-02-10 15:20:20.78 spid152     The database 'MyDatabase' is marked RESTORING and is in a state that does not allow recovery to be run.
2016-02-10 15:21:27.33 spid152     Database MyDatabase has more than 1000 virtual log files which is excessive. Too many virtual log files can cause long startup and backup times. Consider shrinking the log and using a different growth increment to reduce the number of virtual log files.
2016-02-10 15:21:28.32 Backup      Database changes were restored. Database: MyDatabase, creation date(time): 2015/06/26(16:24:41), first LSN: 168536:5780:244, last LSN: 168536:5883:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK, MEDIANAME='MyDatabase_Dump20160210': {'H:\MSSQL\Backup\MyDatabase_20160210_010102_Diff.BAK'}). This is an informational message. No user action is required.
2016-02-10 15:21:28.36 Backup      RESTORE DATABASE successfully processed ...

So perhaps I just need to sort out the "1000 virtual log files" issue

Kristen · February 10, 2016, 4:13pm

DBCC LOGINFO() displayed 16,608 rows

sys.database_files displays:

Data - size=6,131,200 (i.e. 48GB), growth=3,200 (i.e. 25MB)
Log - size=26,566,464 (i.e. 207GB), growth=6,400 (i.e. 50MB)

Clearly on a DEV DB I ought to have a much larger Growth than that, so that any run-away T/A at least runs away without completely killing the VLFs

Shrink Files say that Data is 48GB or which 45GB is available, and Log is 207GB of which all of it is available.

Ah well, at least I can chop that back and improve performance of any future back & restore.

I expect there is some sort of "check" I should have been doing that would have alerted me to this? I wonder if it is in Brent's GrabBag?

Kristen · February 10, 2016, 4:41pm

Yup ... "High VLF Count" message in the Performance section. Pity I haven't run sp_Blitz for a while ...

I've got that in another database, on a production server!!, too ... I'm sure it wasn't there on the production DB last time I ran it ... so I can chalk up "Something good came out of today's mess" !!