AlwaysOn AG + Replication maintenance - two scenarios to get the job done

CDN12345678 · April 19, 2024, 6:06pm

Happy Friday! I'm hoping to get some input on two scenarios I've tested and I wanted to hear different opinions. I have two datacenters close together running MS SQL Server AlwaysOn AG in synchronous mode.

Site A has two separate AG instances (one for critical transactions and one for reports) and we rely on SQL Server replication to populate data on the report AG. We are currently working to move away from replication however, that is a story for a different day. Site A also has secondary replicas for both AGs.

Site B holds secondary instances for both AGs.

Replication is currently working fine except some subscriptions that are pointed to the INSTANCE name rather than the AGL. I need to rebuild these publications and subscriptions and have it pointing to the AGL so replication continues to work upon potential failovers in the future. I've tried other scenarios that resulted in data loss and it seems the best practice is to snapshot.

When I perform this operation, of course, a snapshot is taken and then the appropriate report AG tables are truncated and a bulk insert begins on the primary report instance. The bulk insert is constantly suspended waiting on other operations due to traffic limitations between Site A and B. Of course after the bulk inserts are completed, I will need to insert index as well.

To mitigate this wait time in production, I've tested the following two scenarios, and they both worked fine. It got the job done.

I am leaning on going with option B, but I would like to hear your thoughts out there. The goal is to keep critical transactions flowing with zero interruptions in Site A and to get the report tables up to date as quickly as possible.

A. Drop all secondaries in Site B from AG and then add them back later when I am done. The reseeding will not affect read/write on the primary. This scenario requires extra work to get the distribution database back in AG, because it was not designed for AG. It also requires more wait time for synchronization to complete. Nevertheless, this scenario worked and did not affect critical transactions in Site A. All systems are back online and up to date while the secondaries reseed/sync.

B. Suspend data movement to the specific databases in Site B on the report secondaries. The main concern here is space for tlogs and any potential interruptions while all of these operations replay the transaction log and catch up in Site B later. When I resume data movement, it replays the transaction log, does all of the bulk inserts, creates index, and catches up on all replicated data in the same order. This also did not appear to have any negative affect on critical transactions or replication in Site A. All systems are back online and up to date while the secondaries in Site B catch up. This scenario is preferred.

Are there any caveats I am missing? Of course, I only had a limited amount of transactions and processes running in my test environment, compared to production.

Thoughts? Constructive criticism? Thank you!

robert_volk · April 21, 2024, 5:15pm

I agree that Option B that you described is preferable, it has fewer moving parts and points of failure.

I'm not clear if Site B's AG replicas are sync or async, but if they are in synchronous mode, can you change them to async while the re-snapshotting takes place?

I'm not really knowledgeable on replication and its corner cases, I typically read Ned Otter's list of gotchas for those:

He has a few tips on parallel sync and snapshot compression, they don't work out of the box.

Ned's links to the late, great Robert Davis are currently broken, here's the Wayback for them:

https://web.archive.org/web/20220326030432/https://social.msdn.microsoft.com/Forums/sqlserver/en-US/32955985-4a76-47da-89ac-1e3c50413783/filtering-transactional-replication-with-filtering-column-in-different-table?forum=sqlreplication