Sync Databases (One Direction)

BabyAqua · June 14, 2018, 10:55pm

I have read access to a Vendor database. I would like to be able to setup something where it will do an incremental sync of changes to my internal server, every 30 minutes. After it syncs, both databases would be exactly the same.

Can someone point me in the right direction for this? I have read the thread "How to synchronize large database automatically?" and Log Shipping was recommended. Will the server or database become inaccessible during the sync?

Source:
MS SQL Server 2014 (Enterprise)
3,000 tables but only half are populated.

Destination:
MS SQL Server 2016 (Standard)

Purpose: for reporting and in case if vendor's database goes down.

Thank you

yosiasz · June 14, 2018, 11:04pm

ssis is your friend. standard ETL. Initial load might be huge but subsequent pulls should be light. as you will be doing only deltas.

another option is to do a full backup, restore then do log file restore?

BabyAqua · June 15, 2018, 12:00am

I can see where SSIS can work but you would recommend that over Log Shipping or Replication as in DR?

I was reading a thread by Paul Bison (Log Shipping vs. Replication) but I don't know if that is still relevant since that thread was written 10 years ago.

Yosiasz, if SSIS is your recommended approach, would you have a link to show how to setup? Would that take care of DELETE, INSERT and UPDATE records? Also, how would it take care of cases where tables have added/deleted fields? The development of the source database (vendor) is invisible to me.

yosiasz · June 15, 2018, 12:01am

log shipping > SSIS. I would definitely go with that option seeing that you have a lot of tables!
but as always it really depends.

Log shipping will definitely not help you in the case of added deleted fields. Neither would SSIS.
you would really need to architect for different possibilities. But you cannot really account for everything that can change upstream
but you can make your process be sturdy enough to at least inform you of any changes and not fall falat on its face.
Maybe a hybrid of of both log shipping and SSIS might do the trick.
also if only for reporting maybe you can get only what you need via SSIS?

BabyAqua · June 15, 2018, 9:54pm

I spoke to the vendor today and they told me that they used some replication method built only for MS SQL Server? It also sync DDL or added/deleted fields. What method would that be?

Beside LogShipping, triggers, and SSIS for incremental syncing a database, what other methods are there, not including third-party software?

I am still trying to determine the best method before I implement.

Thanks

yosiasz · June 15, 2018, 10:14pm

what is your ultimate aim??

BabyAqua · June 15, 2018, 10:23pm

For the current project, it is for reporting database server but for future projects, it may cross into DR.

For this one, we only have Data Reader (database role membership) for the vendor's database. So I know the method BackUp/Restore is ruled out. Will this also rule out Log Shipping or Replicatoin (Publisher and Subscriber)?

yosiasz · June 15, 2018, 10:33pm

I think that is up to your vendor. Before you look into anything best to have a chat with your vendor. they might not event want you to tap into their db.

jeffw8713 · June 16, 2018, 5:43pm

Is there a requirement for near real-time data? Or is a one day lag good enough for reporting?

If you need near real-time, and 30 minutes would be...then you are looking at a bit more expensive option. That would be setting up an AlwaysOn Availability Group with a read-only secondary. The read-only secondary would be used for reporting - but it will require Enterprise Edition licensing and the same version (e.g. SQL Server 2014).

If you can support a 1 day lag - then you could build an automated process to restore the daily backup from production. This will depend on how fast the network is, the size of the database(s) to be restore and other factors.

If you cannot do either of the above - then your options are Replication or SSIS (Log Shipping won't work because you have to switch to standby and back every 30 minutes).

Since the source system is vendor supplied and managed - unless they can support Replication and allow you to set that up you are probably left with only SSIS. Since you want this to update every 30 minutes - I would be hesitant to go this route...I doubt you could build a process the extracts incremental data from production for 1500+ tables that could even be run every 30 minutes.

So - this goes back to the reporting requirements...do you really need near real-time data for reporting? Generally, a 1 day lag should be sufficient for most reporting requirements - and those that do require real-time data could still be run directly from the production environment.

One final note - building a backup/restore option can fulfill multiple requirements. First - you are testing your backups and insuring they can be used, second - you can offload integrity checks from production, third - you have a copy of the database available if the vendor's database goes down, and finally - you have access to all data for reporting - it just isn't real time.

BTW - if you upgrade the database to SQL Server 2016 on the reporting host - it isn't going to be useful for recovering production if the vendor's database goes down. You cannot restore from 2016 back to 2014...

BabyAqua · June 16, 2018, 7:27pm

Yosiasz, we already had a discussion with them but it is a bit political. Regardless if we implement a solution, for educational purpose, I am researching on the limitations and best method to approach this for future company's need. Thanks

BabyAqua · June 16, 2018, 7:28pm

Jeff, that is a really impressive reply. Before I replied, I really have to digest the info that you have shared.

But for the time being, the fact is, yes, we need it close to real time. I believe the vendor is using something that you have mentioned: AlwaysOn Availability
to sync the production database to the reporting database. I was trying to understand their sync process and I think you have just confirmed it because they mentioned "AlwaysOn" several time. They said it is new to SQL Server 2014 and is a Microsoft technology. Do you think this is what they are using Jeff?

I need some time to think about what you have just shared and thanks for making it plain english.

jeffw8713 · June 17, 2018, 3:14pm

It does sound like they are using/recommending Always On...

This became available in SQL Server 2012 and is a combination of clustering and mirroring in SQL Server. Although you can setup what is called a BAG in Standard Edition - there are significant limitations. The goal of AlwaysOn is HA/DR - with the advantage of also being able to setup read-only secondaries. The main difference between AlwaysOn and traditional clustering is shared storage - AlwaysOn does not utilize shared storage - each system has its own dedicated storage (which means the DR node can be in another location - as long as the cluster can be created at the Windows level).

For your purposes - you will need Enterprise Edition licenses for the read-only secondary. But - you can also have multiple non-read only secondaries for HA/DR without additional licenses. Those secondaries would be setup/configured so you could fail over with minimal data loss.

You also cannot utilize SQL Server 2016 for the reporting system. It must be the same Edition and version of SQL Server - so your reporting system will be SQL Server 2014 (at the same SP and CU level).

To reduce costs - you do not have to have the same server configuration for the read-only secondary. It can have reduced number of CPUs and memory - but you do need enough to support the processing of transactions across the mirroring session. This will require more space in tempdb and you want those disks to be as fast as possible - since tempdb is heavily used for the version store.

BabyAqua · June 17, 2018, 5:22pm

Jeff, Thank you for another detail reply. You have provided enough details to know that our current setup is not achievable with Always On. I will bring this information back to the company and team to share so that we can make decisions based on your information.

Thank you again for preventing us from going into a rabbit hole.

BabyAqua · June 18, 2018, 5:21am

I have a few questions for you Jeff. Since you mentioned in your post that "Always On" was introduced in SQL Server 2012 (Enterprise Edition), how was DR taken care in prior versions? Is this where BAG came in?

Also, I need to write Stored Procedures for our reports and was hoping to get the data in real time. Since that is no longer the case and I will have to continue to depend on the vendor's report database (with our read only permission), the alternative is to setup a Link Server in order to create Stored Procedures for reports. I have not setup a Link Server for a long time. Do I request the vendor to setup a Link Server for us to connect to their database; or do I setup the Link Server on our server? We already have the SQL Authentication account and password to log into their database. Thanks

BabyAqua · June 18, 2018, 4:12pm

I was able to setup a LinkServer (to connect to vendor) on our server.

jeffw8713 · June 18, 2018, 6:30pm

DR solutions prior to 2012 - and still today - can be setup in multiple ways. Database mirroring was available prior to 2012 but did not have the ability to failover a group of databases.

As for the linked server - it looks like you figured that out. Be aware that running queries across a linked server can and will cause performance issue on your production environment. Whatever you do - do not create queries that join remote tables and local tables - those will not only cause performance issues, but in some cases take hours to run instead of minutes.

yosiasz · June 18, 2018, 6:44pm

to add to what Jeff says. you could do a lot of this in batches

so if you have products list and customers

grab a subset of each (via some filtering or not since you do not know how many rows there are) to a temp/staging table. then use those temp/staging tables to further do your reports.

BabyAqua · June 19, 2018, 1:38am

I did a lot of research on Link Server (OpenQuery, OpenRowset and OpenDatasource). In conclusion, I decided to avoid that whole thing, in concern for future bottle necks. What is the difference between OpenRowset and OpenDatasource?

I took yosiasz advice and decided to discuss with the vendor. They have created a new database (on their server) for us with write permission so that we can create Stored Procedures, Functions, and so.... This eliminates the usage of Link Server and a need for Real-Time data (on local server) all together. It also supports Jeff's recommendation of avoiding queries across remote vs local tables.

However, I am going to research in further details on all of the information that I have acquired on this thread for future needs.

Thank You to both of you!

srikanthchindam · June 19, 2018, 1:34pm

Hi @BabyAqua,

There are many similarities between your post and my solution, I developed a synchronous solution for the vendor based client which synchronizes reporting server with transaction server.

Developed solution synchronizes the two databases where the every change happened in the source will be reflected in the target which can even synchronize selective tables. This solution is currently in use by one of my clients, which synchronizes 2000+ tables with one minute delay.

Where I have used change tracking concept to get it done.

Prerequisites:
Source database tables should have enabled with Change tracking.
Linked server access from target to source.

Best part of it is, we don't need to have the enterprise edition to do all these activities where the express edition suffices the requirement.

Thanks,
Srikanth Chindam
Email : SrikanthChindam26@gmail.com

BabyAqua · June 19, 2018, 4:48pm

Hi Srikanth,

Thank you for sharing your architecture. Is your architecture able to handle changes to tables (add/delete) of fields? I think this is called DDL right?