Seperating data into graded ranks on Replication?

fcaglayan · August 10, 2015, 7:20am

Hello everyone.
I will build up a new database and confused on 2 topics.
A quick info for what I want to do is I will have metadata and related to that data BLOB data.
Because of this BLOB data will hold to much space and replication of these datas may slow the replication I want to set ranks or level it. What is important for me is metadata part but if I use a simple peer to peer replication I think replicating the big BLOB data will slow the replicating of metadata. Is there a way of leveling the replication? I mean metadata is prior for me and while BLOB datas are replicating, replication of metadata must not wait the whole BLOB datas to replicate.
Also there will be same problem for backups.

dennisc · August 10, 2015, 9:41am

Hi fcaglayan,
Let me see if I can answer it. If I understand correctly, you would like to replicate data with BLOB data and set ranks for which data should replicate. Let me break it into two part: replication and backup.

The easier part is backup, you can simply set another filegroup to store the blob data and perform filegroup backup on those, that way, you can have your normal data and blob data in two different backup file (and different order).

As for replication, I would assume you have full control of the app (meaning you can change the design of tables). You can add in a flag to state the importance, and perform replication with that filter, so it will only replicate with "high" importance and not the others. However, if you want to replicate the blob data as well and wants to set priority (data before blob), I don't think that is possible with simple solution.

Hope this helps.

fcaglayan · August 10, 2015, 11:07am

Thanks for the reply.
Unfortunately that is exactly what I want : "data before blob"
Because data has to be synchronous at minimum time but if the blob data will slow down the replication, synchronization of the second data have to wait blob first.
Is there no hope for this? Like setting different replication models or ..?

Kristen · August 10, 2015, 3:34pm

I'm not sure that filegroups will help you. You still have to back everything up, you have to use Full Recovery Model, and restores are (potentially) more complicated because you have to get all the restored to the same, consistent, point in time.

That said, we don't use Filegroups so someone more knowledgeable can no doubt correct me

The only thing that I know of that allows some-data to be inconsistent with other-data is to use two databases. You can write the main data to Database-A and have that replicate to RemoteDatabase-A and at the same time write the BLOB to Database-B - with a different replication (AND Backup) strategy to Database-A.

However, what happens if Database-B crashes and you cannot recover it to the latest point-in-time? You then have some records in Database-A that have NO associated Blob data in Database-B.

Are you sure your Blobs are SO big that they will cause a problem with Replication and Backup?

If you can have some Filegroups which are READ ONLY then you can exclude them from your regular backups. You might be able to Partition your data to achieve that - e.g. start a new partition table each month, and move last month's table to the Read Only filegroup, so then it doesn't need to be included in the regular backups.

dennisc · August 11, 2015, 1:38am

Hi,
Kristen, you are right. I was thinking about filegroup based on read-only data, what was I thinking

As you will need all data anyway, it will takes time for it to replicate everything.

Kristen · August 11, 2015, 1:43am

Separate Filegroups for Read/Write data are OK provide you have Full Recover Model, but I'm not sure what they gain ... splitting data over multiple drives, ability to back up in parallel perhaps?

I think one key benefit? might be during Restore in that once one Filegroup is restored it can be made available whilst the other Filegroup(s) are still being restored.

I think I would prefer to use separate databases - then I can deliberately run them out-of-sync if I had a reason to do so - e.g. Store details about documents in Database1 and stored scanned images of documents in Database2 - quite possibly hours / days behind the data being entered into Database1

fcaglayan · August 11, 2015, 7:23am

Ok let me explain my topology briefly.
I have 2 DBs at 2 different cities. The system is already working like that with Peer to peer replication but only with data.
Now BLOBs will included to system. I will have different filegroups and partitions on BLOB tables.
According to backing up I am thinkin as what you said. At the end I have to backup all data and BLOB so backing up has not much solutions, agreed.
Each BLOB is 300 KB but I may have more than 500000 data daily. (so BLOB => 500000x 300KB)
I will split the system and half of that data will be written to A, the other half will be written to B but at the same time it will try to replicate to the other site.
As I said before the data part is more important and need to be replicated instantly.
So the only solution that comes to my mind: If I have 2 databases on one site ( 1 for data, the other for BLOB data) and replicate them seperately, that may be a solution for me because in my opinion In this scenario replication won't work sequently and won't wait the first in BLOB data to write second data ( I hope or am I wrong? ) But if there is a solution to set importance of data to be replicated first, of course one database will be better for me. ( or is it possible to set 2 different peer to peer replication on one DB? )
I hope I am clear now

Kristen · August 11, 2015, 8:20am

How long a delay could you tollerate for the replication of the Blobs?

I wonder (if you had them in a second database) whether you could use Log Shipping to "replicate" them to the other site. I think that would be a lot more efficient as it would "ship" the log file for the database changes as a large lump, rather than each individual Blob which, with handshaking etc., I assume would be more "traffic".

But its only a guess ... only a real-world test would establish if Log Shipping is more efficient (less bandwidth used, in total, although I'm pretty sure that the latency effects would be much much less)

You could then set Log Shipping to run once every 10 minutes ... or once every minute.

Assuming that Log Shipping uses less bandwidth than replication (of the same data) then the impact of the log shipping (of the Blobs) on the replication of the main data should be "less overall" - although you might notice some slowdown of replication of main data when log shipping triggers.

Might help to understand "why" you need the main data to replicate to City-2 "immediately" - better understanding the problem, and what "immediately" means in practice, might give rise to a different solution.

However, all this begs the question: can you fix / improve the Bandwidth / Latency of your COMMs between the two cities?

fcaglayan · August 11, 2015, 1:44pm

Hi Kristen,
Yes if Log shipping works proper and can work for two ways10 mins for BLOB can be affordable for me.
But while I was investigating I have seen that I can set 2 distribution tables and 2 publication on the same DB.
At distribution properties I saw I can change or add publication table to use the 2nd distribution table. That may solve my problem but...
I am not sure if this is working as I understand because I can not add distribution properties for a second publication on the same server.
What I mean is At server A, I can define 2 distribution table and 2 publication and same for server B.
But at distribution properties's publication tab when I say add the same server (server A) nothing happens. Just I can add to publication server b to use second distribution table.

Kristen · August 11, 2015, 2:11pm

You want two-way replication? i.e. people can change records at either end?

Can't do that with Log Shipping - the far end has to be read-only (or you will have to have some fancy setup with Data Entry databases at each end that get consolidated at a Master location. My best guess would be that you'd be better off to fix the bandwidth issue.

I can;t see how having two separate distribution groups will help you. I am not aware that you can adjust priority of one over the other, and everything will happen in Synchronisation Order - so if you add a Main Record and a Blob and then your colleagues does the same 1 second later, your Main Record AND Blob has to get to the Remote database before ANY of your colleagues data will be processed.

This is not a cakewalk to set up! so as I said earlier it would without doubt help to understand what problem you are trying to solve, rather that just trying to advise on your chosen solution absent details of the problem.

fcaglayan · August 12, 2015, 8:25am

Yes I am working active -active. Now that tells me why I don't know about log shipping

I am not sure about If my idea will work but If I have two seperated distribution table works on two different publication, at least the Synchronisation will work seperately in its own distribution's order. (or am I wrong?)
If it works on its own distribution order, the second incoming data won't wait first came BLOB data to finish and will start Synchronisation when it is come.

Kristen · August 12, 2015, 9:06am

You may be right, but I can't see how database integrity can be maintained if Replication applies Main Data from Record 2 before Blob from Record 1 has been applied.

IF that is the case then you are no better off. Yes they will be distributed separately, but Main Data from Record 2 will not "appear" to the remote user any sooner.