Data replication between two databases

AlexShev · January 23, 2025, 3:40am

Hi,
I am looking for a solution to replicate data from our production MS SQL server to a secondary MS SQL, so a huge number of reports will not affect the primary production one.
The important part is the schedule.

Currently we use a nightly backup, which is copied to the secondary and restored there. It happens nightly. The business wants to have it at least hourly.
Here are the challenges:

We cannot restore backups hourly as the reports will fail during the restoration.
We cannot use log shipping for the same reason.
We cannot use the Snapshot replication as it will heavily load the production server.
We cannot use Transactional replication as it requires primary keys for the tables (articles).
I am not sure about the Merge transaction, but I have serious doubts as the data flow is supposed to be one way only.
I am not sure about SSIS as it seems require changing the source tables. Also I am not very familiar with SSIS. I can study if it is the way.
Potentially we can use Always On Availability groups with readable replicas and we have those in company already. But it will require additional licensing and re-configuration of the primary production database.
If you have some ideas, can you please share your knowledge?
The links are welcome too.
Thank you.
Alex

robert_volk · January 23, 2025, 1:10pm

I have an insane idea that might make log shipping viable, but you'd need space on the replica for 2 copies of the database to be restored.

The idea requires a third database that is empty of data, it's only purpose is to contain synonyms that point to the other log-shipped databases. This is the database that users would query.

The synonyms are created to have the same name as the tables/views they would normally query, but they point to another DB that is log-shipped:

USE MyDB;
CREATE SYNONYM table1 FOR LogShip1.dbo.table1;
-- repeat for all relevant tables/views

The LogShip1 database would not be refreshed via log shipping for the next hour. Instead the LogShip2 database would be log shipped as often as you like, until you are ready to freshen the data for MyDB. At that point you would need to DROP the synonyms in MyDB and re-create them to point to LogShip2:

USE MyDB;
DROP SYNONYM table1;
CREATE SYNONYM table1 FOR LogShip2.dbo.table1;
-- repeat for all relevant tables/views, point to LogShip2 database

Unfortunately there is no ALTER SYNONYM statement, however dropping and creating synonyms are metadata-only operations, they take milliseconds or less and do not block anything. I believe you can wrap the entire process into a stored procedure that you can execute to switch the synonyms between the two databases. See link below for another method.

An additional benefit is that having 2 replica DBs handy means you always have data if something fails, you simply switch back to the available DB, and troubleshoot/resync later.

You would have to create (more likely modify) the log shipping restore jobs for LogShip1 and LogShip2 databases. You would only need 1 copy job. I'd recommend backing up the log more often than once per hour, the LS restore job will restore multiple log backups if necessary.

Each restore job would have an alternating schedule, one would run on the odd hours, the other on the even hours, for example, both would RESTORE WITH STANDBY. Once you confirm the restore succeeded and you want to refresh, you'd run the synonym swap.

One huge caveat: I have not done this particular thing, but have done other methods to refresh data with minimal/metadata-only operations. Particularly using ALTER SCHEMA...TRANSFER, you can read about that here:

Those techniques and implementations have different goals than yours, but the mechanism is very similar: two copies of everything, and statements to re-point the query to fresh data. There is code that can dynamically generate the SQL for ALTER SCHEMA...TRANSFER here:

That can be easily adapted to do synonym swaps.

If that won't work for you, then my suggestion is the AG read replica. SSIS won't be a viable option, since you mentioned that Snapshot replication would overload your primary server, SSIS would have at least the same impact. Without primary keys, (and you really, really, REALLY need to fix that) you have no means of determining changed rows, so it's a full load every refresh.

JeffModen · January 24, 2025, 1:31pm

I wasn't one of the people that actually set it up but I worked for a company that used what they called a "SAN snapshot". It was quite a while ago but even on just 32 bit machines, it took less than 6 seconds for the "Report Server" to be back online with fresh data.

Talk with your SAN vendor... a lot of SANs can do this.