Four days of data disappeared after restart

crabstacker · August 27, 2015, 1:59pm

I am just now in the midst of a strange issue. A client moved their SQL database server from Hyper-V to VMWare on Saturday. Yesterday they called me complaining about performance. In the brief time and limited access I had I was able to see a lot of indicators of poor I/O (and many other problems) and I made a few recommendations.

They restarted their SQL server to add on the VMWare tools and take care of a few other tasks. Upon restart their data reverted (from their perspective) to the moment they brought it up on the new VM. They found that the database was pointed to the old iSCSI drives instead of the new vmdk files and wasn't writing anything at all to disk. They tried restoring their backup but it fails every time before completion.

Did the database just fill up the dirty cache, never writing to disk? How in the world did the database continue to function for five days without presenting anything more than slow I/O to the end users? A complicating factor is they didn't have any alerts set up on the database that might have warned them of these kinds of problems. The event logs on the server itself show that there were numerous write errors.

ScottPletcher · August 27, 2015, 3:13pm

Yes, most disk subsystems can/will use cache to prevent going to the physical drives as much as possible. That's a normal thing, and a good thing,

If SQL is telling you it's getting write errors, you must deal with those immediately, or have major potential loss of data. In particular, if log files are not being written correctly, you might as well shut down SQL, as the logs are absolutely critical for proper function of SQL Server (or any RDBMS).

In such cases, first take a full backup, just in case. If you can afford to, also shut down SQL and copy the physical files themselves to a different location. Then run a CHECKDB and correct any issues noted.