Need to Increase Data load Performance

yaman · April 6, 2017, 2:03pm

Hi Expert,

we have different data source location and need to fetch data from all of them one by one execution using link server.

Is there an way we can fetch data from all in one click means start at single time from all source.

Regards,

jotorre_riversidedpss.org · April 6, 2017, 10:11pm

Union all

JeffModen · April 7, 2017, 6:08am

@yaman,

Parallel loads aren't a panacea of performance and may, in fact, slow things down quite a bit unless you can guarantee that each source will be loaded to it's own set of disk spindles. Otherwise, the read/write heads will jump all over hell's little half acre and that action will require a whole lot of seek time. Parallel loads to the same spindles will usually take as long or longer than a serial stream of loads.

Now, if you really want to learn how to load fast from multiple sources, how about telling us a little (lot actually) about whether or need the sources are other SQL Servers, how many of them there are, and how many databases you want to load from, how many tables in each database you want to load from, how many GB all that adds up to, and how often you need to do it. We also need to know if the target DBs live on a SAN, NAS, or what.

Kristen · April 7, 2017, 11:58am

We do that a lot, using all sorts of strategies.

The thing that helps the most is if the REMOTE has a ChangeDateTime column, on every table, so that we only have to "pull" new rows. Need to also handle Deleted Rows, we do that by "pulling" all the PKey values, and deleting anything (locally) that is not in that list, but if you can "log" the deletions on the Remote that is better (particularly for tables with many millions of rows)

We pre-sort the import (on clustered index keys) and use the bulk import hint that the imported data is in clustered index order.

That sounds like you mean 4-part naming e.g.

SELECT Col1, Col2, ...
FROM MyLinkedServer.MyRemoteDatabase.dbo.MyRemoteTable

we very rarely do that, much more often we use OPENQUERY because we find it performs better, but if you are pulling "lots of millions of rows" then some sort of batch-size control is needed (which, for us, is solved by ChangeDateTime column because we never have "lots of millions of rows" that have CHANGED since the last import.