Running an AG across a WAN

Is anyone running an Availability Group across a WAN? We're getting ready to turn this on and I'm curious what people are doing.

  • We are planning to run two nodes on one side and possibly two nodes on the other. Are you running just a single node in your remote data center?

  • With mirroring we regularly get disconnects over the WAN but they reconnect 30-60 seconds later. Do you experience anything like that in your AG?

  • How are you doing quorum? The remote nodes don't vote?

We do AGs over a WAN at my current job and at my last job. My current job has just one replica at the DR site. My old job had the same number of replicas at both sites.

The key is the MultiSubnetFailover connection attribute that'll be needed in the application's connection strings. Make sure the database drivers support it.

The disconnects shouldn't matter, though we don't experience them.

Regarding quorum, we have a mixture of node majority and file share witnesses. At my last job, we also used a quorum drive on the cluster.

That's correct that the remote nodes don't vote. It's very important that their votes be turned off.

And I'm assuming that you only had automatic fail-over enabled on the local nodes?

Yes auto-failover on the local nodes only. They were/are set at syncronous replicas. Async to remote/DR with manual failover. At my last job, we had 3 sync replicas at the local site. The second replica was for reporting or read-only purposes (used the ApplicationIntent=ReadOnly in the application connection strings). The third replica was just in case we lost either of the other two nodes. Then we had another 3 nodes at the DR site. As this is/was SQL Server 2012, only 2 of them were in the AG due to the 2012 replica count restriction. The 2 nodes were async. We considered log shipping to the 3rd node so that it was ready to go in case of a manual failover, but I never got around to implementing it.

Interesting. That's a lot like what we're getting ready to put up. We've got two 3 node clusters on each side using database mirroring right now.

We're debating between

  • 3 nodes and 2 nodes
  • 2 nodes and 2 nodes
  • 2 nodes and 1 node
  • And I'm sure someone will suggest 3 nodes and 1 node at some point. Just because.

I think it mostly depends on how much they want to spend on hardware.

If you go with 2 nodes at the remote site, you'll want to have another resource available to provide quorum when a manual failover is done. I'd use a file share witness or a quorum drive. I don't like the idea of 1 node at the remote site since you are running at risk when a manual failover is done. I am making the assumption that a test failover is done 1-2 times per year and that you'd run prod out of the remote site for at least a few hours to a few days. If it's never tested and it's really only for emergencies, then I suppose the 1 node option is okay but with a warning to get another server online ASAP.

I like 3 nodes at each site because you have node majority at both sites when production is running out of there (don't forget to flip the votes during manual failover!).