Identical tables with massive size differences?

rasta_pickles · June 17, 2017, 4:36pm

Hope someone might be able to help here, our company DBA has drawn a blank.

Server A is our live server. Server B is our reporting server.

Every morning at 3a.m a table on server A gets updated with the previous day's transactions. The table has approximately 22 million rows.

This table then gets copied across to server B in order that people can then query it.

It's all part of an SSIS package; I believe the table on server B is deleted and then re-created as part of the package.

On Tuesday, we noticed all of the queries running against the report server were timing out (5 mins max allowed).

Upon further examination, the table on server A was 1.9GB in size, the table on server B was 13.4GB!

No wonder everything was timing out?

Does anyone have any idea as to why two identical tables can be so disproportionate in terms of size?

Thanks in advance.

(Event logs on server B show that, at some point between Monday and Tuesday, the SQL Server service "stopped unexpectedly", could that be relevant?)

JeffModen · June 17, 2017, 6:08pm

That would be one of the very first things that I'd check. If they're doing DELETEs or even TRUNCATEs, that could be a part of the problem. From what I understand, SSIS doesn't typically handle millions of rows all in one big chunk and so, with all the indexes you have in place, you could end up with a hell of a lot of page splits during the transfer leaving a whole lot of empty space in the destination table even if it is Dropped and rebuilt.

Still, even that shouldn't be a real problem in the presence of good reporting code. If SSIS is actually "chunking" the data, you may have reached a sensitive tipping point for stats. The first thing I'd try for getting out of the woods quickly is to rebuild the stats on the destination table. Use FULLSCAN when you do. There's almost no difference between a FULLSCAN and a 30% sample. Any larger sample will take longer than a FULLSCAN. The default sample rates are terrible especially when ever-increasing clustered indexes are present.

Do a sys.dm_db_index_phsyical_stats using the DETAILED option on the table and see what the percent of page used is for each of the indexes. That will give you some clue as to whether or not I'm correct about the page splits happening. It could also be an indication that they're not actually clearing the table before repopulating it.

As a bit of a sidebar, 1.9GB is a relatively trivial table size nowadays. My recommendation would be that you have two tables on Server B (the destination server) named differently than your current destination table. You would also have a synonym named as what your current named table is (and should go away). Then, you'll be able to keep the current (say, TableA_1) up until the other of the two tables (say, TableA2) is fully rebuilt. If the rebuild of TableA2 is successful, then simply repoint the synonym from TableA1 to TableA2. If the rebuild is not successful, don't repoint the synonym... at least you'll have yesterday's data to work with until you can make a fix. The next day, you'd swap everything. Rebuild TableA1 while the synonym is pointing to TableA2 and, when successfully rebuilt, repoint the synonym to TableA1.

Last but not least, consider the power, performance, and low resource usage of "Minimal Logging" during the transfer. Leave the clustered index on the target table and disable or drop all the non-clustered indexes. Since the clustered index of the source and destination tables are identical, adding an ORDER BY in the same order as the clustered index will easily allow minimal logging with the clustered index in place in the destination saving on considerable MDF and LDF bloat. The reenable or rebuild the NCIs. All of these actions will inherently rebuild the stats at the destination as they occur.

Of course, you will need to be in some other RECOVERY MODEL other than FULL for Minimal Logging to take effect and you will need to follow some additional rules like using the WITH (TABLOCK) hint on the table being inserted into.

Just a personal opinion... there's no way in hell I'd be using SSIS to do any of this. It's an unnecessary complication for something so simple.

Last but not least, check with your SAN Admin. They may be able to solve all of your problems in this area with a simple, nearly instantaneous "SAN Snapshot".

Kristen · June 17, 2017, 7:24pm

Not sure that is relevant ... but it might suggest that the Garbage Collection is not happening, perhaps because there is no Clustered Index. Presumably the RowCount on both tables is identical? If not that suggests that the Copy is not "like-for-like".

I would rebuild all indexes after the "copy".

make sure you have a clustered index on the table on Server-B - otherwise the table on Server-B will be a HEAP and that can lead to problems with garbage collection (i.e. reusing space released from a deleted row)

jeffw8713 · June 19, 2017, 6:21pm

I just ran into this on one of my systems - the table was a HEAP and they were deleting the data and repopulating the table every day. Over time - the HEAP pages are still allocated but cannot be reused unless you are using TABLOCK hint (note: this can be defined on the OLEDB destination - as well as setting appropriate batch and commit sizes).

Issue an ALTER TABLE {table} WITH REBUILD and see if that recovers the space.