INSERT across LINKED SERVER = slow

ipisors · September 28, 2015, 8:54pm

My procedure has 3 basic parts to it, from a simple viewpoint.

run some select/union statements to build up data FROM the remote server, to a local temp table.
delete all records in a table on the remote server
insert into specified columns in a table on the remote server, a select statement that joins between the local temp table, and a bunch of other local tables.

Every single component works within a few seconds (including the Select statement that does the insert, and including the Selects from the remote server to populate the local temp table), other than when I run it all ... to include the Insert, which is apparently killing it. When I do this the whole thing takes 20 minutes. It's only inserting 16,000 records.

Is this normal when inserting to a remote table?? Any way around it other than re-structuring my SSIS package to totally separate these and Data Flow them together, I really did not want to do this.
Lastly -I've looked at the table design of the table I'm inserting into. It has NO indexes, NO triggers, and all fields are varchar(300)

gbritton · September 29, 2015, 1:17pm

Obvious question: Why not run the package on the remote server?

gbritton · September 29, 2015, 1:18pm

Also, check out this: Slow inserts on remote server

ipisors · September 29, 2015, 2:46pm

Yes, very fair question. (and I can understand this question particularly as a reactionary question, however if you stop and think about it, I think if I were running it on the 'remote' server, let's call it Server1, and complaining about the slowness of the 90% of the query that runs on the other server Server2, isn't there a great chance you'd ask...Why aren't you running this on Server2? I mean why either one vs. the other, right? There's data pulls from both).

Well when I have queries that involve code on several or at least 2 servers, and there is a link, and I have access to both so there is a choice, a number of factors go into my decision or at least my impulse on where I want to code it:

Where the 'majority' of the code/processing is occurring. Most of the heavy lifting, or so I thought, is occurring on Server2, hence, I figured OK let's do this using a connection to Server2, and we'll push the insert to Server1.
Where I am less familiar with the tables, and can really use the help of intellisense, so I was developing on Server2 to help me out. Of course I can change it now, but, I was so shocked at this slowness that I thought I might ask for input.
In this particular case, I had experienced annoying slowness when pulling data (from being connected to Server1, pulling from Server2) so in this case I thought I'd avoid that by treating Server1 as remote.

Guess I'm just surprised that an Insert, with data already provisioned and ready to be plopped somewhere, theoretically it seems very counter intuitive that it would be so slow ....

Oh just saw your link and read it over. Yeah, I guess the blanket answer probably deserves to just go back to the linked server and how i shouldn't expect much of anything from them (although I'm really skeptical of someone who just gives that blanket answer all the time as opposed to accepting that there may actually be some differences between scenarios worth studying for those of us who can't just "... I replace them wherever I see them". (with what? magic?)

I mean in this case I can use SSIS as a fall back if absolutely necessary, but if this were a user-requested report I couldn't.
Uhhh. Too many servers.

jeffw8713 · September 29, 2015, 5:56pm

If you are using SSIS to begin with - then why are you using a linked server to perform any inserts? Or even using a linked server to get data - that doesn't make sense.

It sounds like you are running into an issue that comes up a lot with linked servers - and that is how SQL Server is performing the work across that linked server. Basically it is creating a cursor and inserting it row by row inside a single transaction which will take a long time.

I would use SSIS to extract the data from the source system - placing it into a staging table (not a temp table) on the destination. Then truncate the table using a SQL command in SSIS and finally another data flow to pull the data from the destination system and put it back into the source system.

Eliminate the linked server altogether - and manage the batch/commit sizes on the OLEDB destination object so the data is inserted within a reasonable batch size to insure the transaction log doesn't grow.

ipisors · September 29, 2015, 6:11pm

Yes, that's becoming more obvious as I go along. Hindsight is 20/20.

It may be difficult to understand since you can't see the 'entire' process, which I understand. I'm already using SSIS, and this is a very large package with about 30 steps in it and already some of them have data flow layers...I'm not keen to create a bunch MORE data flows, if it is for small stuff that I can do in procedures. (If a person parses out too many data flow tasks, someone is going to say Why aren't you just combining it in a procedure - just the psychology of the question more than anything sometimes I think).

I think what I'll most likely end up doing is switching which server is local vs. which is remote - doing the 'pull' from the remote one, rather than pushing from it. Assuming that's not too slow. Worst case, I'll start splitting things into another bunch of data flow tasks in the package if need be.

Another solution I have found (elsewhere) which seems like very good advice, is that when working with linked servers, using OpenQuery to pass the work to the remote server is often better than what I am currently doing. May try that as well.

Good to learn of all the options. Thanks for everyone's input.