Page Compression and Index Fragmentation

Does Page Compression make Index Fragmentation practically irrelevant?

@JeffModen, I'm picking on you specifically, but anyone else with any insights or links to blog posts or videos on this topic would be appreciated.

I'm thinking it doesn't have any impact. It just puts more rows into a page.

You could argue less pages, therefore less fragmentation. But there is still fragmentation.

Curious to see what others think...

Page Compression also contains Row Compression.

Row Compression is NASTY. It turns things like INT, DATETIME, CHAR() and a bunch of other datatypes from "fixed width" to "variable width".

So, if you have this nice, tight, clustered index that you insert into and you make an update to, say, a usually NULL "ModifiedDate" column for just 1 out of 30 rows, guess what happens... MASSIVE PAGE SPLITs, INSTANT LOGICAL FRAGMENTATION to more than 99%, and page densities around 50% or so.

Logical Fragmentation doesn't matter so much, especially on SSDs or the "first run" on spinning rust but the number of pages splits can be detrimental to the update performance, the size of the log file, and reads while the splits are occurring. The low Page Density is a waste of memory, diskspace, and backup/restore durations.

And, I assure you, those numbers aren't "guesses". I have code. Run it and see. :yum:

--===================================================================
--      Create two identical tables except one had a non-compressed
--      Clustered index and the other has a Page Compressed Clustered
--      Index. Neither table has "variable width" datatypes.
--===================================================================
   DROP TABLE IF EXISTS JBM_FragTest_NoComp;
 CREATE TABLE dbo.JBM_FragTest_NoComp
        (
         RowNum         INT         IDENTITY(1,1)
        ,SomeColumns    CHAR(100)   NOT NULL DEFAULT 'X'
        ,CreatedDate    DATETIME    NOT NULL
        ,ModifiedDate   DATETIME    NULL
        ,CONSTRAINT PK_JBM_FragTest_NoComp 
         PRIMARY KEY CLUSTERED (RowNum)
         WITH (DATA_COMPRESSION = NONE)
        )
;
   DROP TABLE IF EXISTS JBM_FragTest_PageComp;
 CREATE TABLE dbo.JBM_FragTest_PageComp
        (
         RowNum         INT         IDENTITY(1,1)
        ,SomeColumns    CHAR(100)   NOT NULL DEFAULT 'X'
        ,CreatedDate    DATETIME    NOT NULL
        ,ModifiedDate   DATETIME    NULL
        ,CONSTRAINT PK_JBM_FragTest_PageComp 
         PRIMARY KEY CLUSTERED (RowNum)
         WITH (DATA_COMPRESSION = PAGE)
        )
;
--===================================================================
--      Populate both tables in an identical manner.
--===================================================================
 INSERT INTO JBM_FragTest_NoComp WITH (TABLOCK)
        (CreatedDate)
 SELECT TOP 1000000
        CreatedDate = GETDATE()
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
 INSERT INTO JBM_FragTest_PageComp  WITH (TABLOCK)
        (CreatedDate)
 SELECT TOP 1000000
        CreatedDate = GETDATE()
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
--===================================================================
--      Check the fragmentation of both tables after the initial 
--      population
--===================================================================
 SELECT  InSitu = 'After Insert'
        ,TableName =OBJECT_NAME(object_id)
        ,avg_fragmentation_in_percent
        ,fragment_count
        ,avg_fragment_size_in_pages
        ,page_count
        ,avg_page_space_used_in_percent
   FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('JBM_FragTest_NoComp'),NULL,NULL,'DETAILED')
  WHERE index_level = 0
  UNION ALL
 SELECT  InSitu = 'After Insert'
        ,TableName =OBJECT_NAME(object_id)
        ,avg_fragmentation_in_percent
        ,fragment_count
        ,avg_fragment_size_in_pages
        ,page_count
        ,avg_page_space_used_in_percent
   FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('JBM_FragTest_PageComp'),NULL,NULL,'DETAILED')
  WHERE index_level = 0
;
--===================================================================
--      Update the fixed width ModifiedDate column from NULL to a
--      datetime for just 1 out of every 30 rows.
--===================================================================
 UPDATE JBM_FragTest_NoComp
    SET ModifiedDate = GETDATE()
  WHERE RowNum % 30 = 0 --Update just one of every 30 rows
;
 UPDATE JBM_FragTest_PageComp
    SET ModifiedDate = GETDATE()
  WHERE RowNum % 30 = 0 --Update just one of every 30 rows
;
--===================================================================
--      Check the fragmentation of both tables the updates to the
--      supposed fixed size columns and OH MY!!!! SURPRISE!!!!
--      Compression allows fragmentation to happen where there was
--      no fragmentation before.
--      The Logical fragmentation doesn't matter so much but the
--      average fragment size and page density do!
--===================================================================
 SELECT  InSitu = 'After Update'
        ,TableName =OBJECT_NAME(object_id)
        ,avg_fragmentation_in_percent
        ,fragment_count
        ,avg_fragment_size_in_pages
        ,page_count
        ,avg_page_space_used_in_percent
   FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('JBM_FragTest_NoComp'),NULL,NULL,'DETAILED')
  WHERE index_level = 0
  UNION ALL
 SELECT  InSitu = 'SURPRISE!!!!'
        ,TableName =OBJECT_NAME(object_id)
        ,avg_fragmentation_in_percent
        ,fragment_count
        ,avg_fragment_size_in_pages
        ,page_count
        ,avg_page_space_used_in_percent
   FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('JBM_FragTest_PageComp'),NULL,NULL,'DETAILED')
  WHERE index_level = 0
;

So Page Compression will take an otherwise fragment-free table, even in the face of updates, and turn it into a perpetually fragmenting monster.

{EDIT} I say "will", hopefully obviously , but "IT DEPENDS"s on if you have "ExpAnsive" Udates on what used to be fixed-width columns and have become "Variable Width" because of the Row Compression. The situation portrayed in the code above is VERY common whether it's for just a date column or not.

For a complete list of previous "fixed-width" datatypes that become "variable width" under row compression, please see the following article. Also, understand that NULL values are stored as a 1 byte value to start with which is why small amounts of updated rows affect things like DATETIME columns so very much.

Even with that, the fragmentation may still be worth it in space savings. 40-60% of a Terabyte is a substantial savings but you MUST remember what rapid fire page splits can do to your system at all levels, not to mention having to REBUILD the index to recover the space and maybe leave a bit of a non-100% fill factor, etc, etc.

Another "fine effect" of Page Compression is that it almost doubles the amount of time necessary to do a REBUILD, so plan accordingly,

Rather strong overstatements there. Page compression is great for many situations. It's not guaranteed to cause any frag issues, and often can help reduce them. It depends on this specific situation.

Yep. Works great in certain situations. I just proved one instance, which happens to be incredibly common, where it's a horror. That means I have not made "rather strong overstatements" because I've just proven when Page compression will kill you. I did that in response to people asking if Page Compression makes Index Fragmentation practially irrelevant. The answer is "It Depends" and now everyone knows a very common problem where it can make a huge difference in a very negative way,

Just to make sure, I did add a not to my previous post, which also includes which datatypes are affected by Row Compression (and, therefor, Page Compression).

Not all expansive updates will automatically cause a page split. There may be space left on the page and/or the compression may improve the page usage because text gets compressed instead of taking the full space it would normally take, which would cause a page split.

The most common example is status columns. Developers just love a text description of status. They should encode it to a number but they often just won't. Say a status changes from "In Progress" to "Complete". Since there almost certainly other "Complete" statuses on that page, a compressed row will see very little additional space used. An uncompressed table will require the entire description to be stored again.

Or say you add an integer value. If it's a (very) small value, it will take only 1 or 2 bytes anyway, so an update or two of that will likely not cause a page split.

I do think compared to the massive I/O you save with compression that you are overstating the damage, particularly if you design the table and app with data compression in mind.

Correct. I absolutely agree that "It Depends". And thanks to the two of us talking about it, people may finally realize that it's not a panacea.

No. But it's not automatically a disaster that's gonna force page splits all over the place. In fact, it generally does not, and has vastly more upside than downside. Particularly, again, if you design the table and app originally with the idea of page compression in advance.

Thank you.

I'm investigating the pros and cons of page compression mostly because we have what seems like a slow connection to our SAN from our VM and our max memory is limited.

My servers are used in ETL processes, rather than transactional workloads, and most of our writes to our larger tables are in the form of deleting n days of data and re-inserting it. Several of our tables are grotesquely wide, rows per page extremely low.

What I'm groping around for is something that might cause the data load/reload processes to complete faster. Page compression sounds like it might help somewhat, but have some repercussions I'll need to take into consideration.

Thanks, Jeff. I'll spend some time playing with this. I know every implementation is going to be different, and my current usage patterns will come into play, but I always like knowing what needs to be kept in mind. What keeps me diving down rabbit holes is that I always want to know exactly what IT DEPENDSTM on.

1 Like

@SqlHippo

Hmmmm.... "Wide Table". Are there any FK's pointing at "the" table?

The scenario I portrayed with example code I posted is quite common. If folks don't want something like that to automatically be a disaster, they have to plan for it. If such ExpAnsive Upates occur int the hot-spot, such issues will not be avoidable because the new rows in the hot-spot go in as close to 100% as they can.

Again, though, our interaction here is helping people understand some of the lesser known issues that can occur with Row and Page compression.

And, remember also that if you fit more rows per page, there will be more log file activity (longer locks) when pages do split.

"Look eye. Always look eye". :wink:

The table definition on your example is wrong. Change it to this:

CREATE TABLE dbo.JBM_FragTest_PageComp
(
RowNum INT IDENTITY(1,1)
,SomeColumns CHAR(100) NOT NULL DEFAULT 'X'
,CreatedDate DATETIME NOT NULL
,ModifiedDate DATETIME NULL
,CONSTRAINT PK_JBM_FragTest_PageComp
PRIMARY KEY CLUSTERED (RowNum)
WITH (DATA_COMPRESSION = PAGE, FILLFACTOR = 95)
)

SUPRISE -- NO SPLITS AT ALL!!!!
(My test was on SQL 2016 Enterprise)


|InSitu|TableName|avg_fragmentation_in_percent|fragment_count|avg_fragment_size_in_pages|page_count|avg_page_space_used_in_percent|
|---|---|---|---|---|---|---|
|After Insert|JBM_FragTest_NoComp|0.01|8|2016.25|16130|98.7833086236719|
|After Insert|JBM_FragTest_PageComp|0|4|358.25|1433|95.0592414133926|

|InSitu|TableName|avg_fragmentation_in_percent|fragment_count|avg_fragment_size_in_pages|page_count|avg_page_space_used_in_percent|
|---|---|---|---|---|---|---|
|After Update|JBM_FragTest_NoComp|0.01|8|2016.25|16130|98.7833086236719|
|SURPRISE!!!!|JBM_FragTest_PageComp|0|4|358.25|1433|96.3845441067457|

Again, unfortunately it's your responsibility to prepare for page compression if you intend to use it in that type of situation, SQL won't do it for you automatically, that's true.

Scott... your code takes advantage of the fact that the first insert can be both minimally logged and will follow a Fill Factor. The inserts after that will not follow the fill factor. They will go in at 100%. That also means that you will eventually need to rebuild the index where you didn't need to before the compression and it also means that you had page splits where you had none before.

Here's an article where I documented and demonstrated that phenomenon...

"It Depends" and nothing is a panacea.

It was YOUR code actually.

But you also need to strongly consider what the compression gained you just as you have to adapt your approach to aid compression: page compression saves you uge amounts of storage, and the corresponding disk write time. Even with SSDs, writing 70% less bytes is a big savings.

Our most important table, the one used by almost all of our reports and processes has no explicit key.

Our data source is a view that is a join of about 20 tables that are then aggregated. Our table's "key" then ends up being the entire GROUP BY list, which is something like 80 columns. Realistically, that can be pared down some, but not much. :frowning_face:

You should consider aggregating each table individually before joining them, if possible, since there's less overhead that way. (1) Often you can advantageously use a clus index on individual tables to improve the aggregation performance. (2) SQL can typically do a merge join afterward which is very efficient when applicable.

1 Like

All I can add to that is a loud "BOY HOWDY"! I totally agree.

I'm hoping I'll eventually be able to make some significant changes to that idiotic process.

1 Like