Linked Server Update Statement performance

dmaxj · January 1, 2017, 8:29pm

Hello,

I have an SQL query that updates 1 column in ~3000 rows on a linked server. The query takes about 2.5 hours to run, which seems like a really long time. I have drilled into waits on the source server and I see an OLEDB wait with the server name and the spid on the target. I drilled into the waits on the target server and I find FETCH API_CURSOR wait...

Also note that the source server is a standalone SQL Server instance and the target server is an Active/Passive SQL Server Cluster. I did notice that MSDTC services were started on both servers, but MSDTC is NOT a cluster resource on the target...

Thanks in advance...

yosiasz · January 2, 2017, 9:12pm

Yeah that is super crazy long.

Show us your update statement. table structure and all
Where is the linked server located at? on premise? cloud? azure?

dmaxj · January 3, 2017, 4:55pm

The linked server is in the same datacenter - gigabit network. Here is the update statement... Please advise as to what is needed regarding table structures... thanks...

UPDATE A
SET TEXTSNAM = CAST(xpx.pxp as CHAR(10))
, NAMECUST = xpx.pxp
, TEXTSTRE1 = CASE WHEN xpx.BillingAddress1 <> ''
THEN BillingAddress1
ELSE xpx.AddressLine1
END
, TEXTSTRE2 = CASE WHEN xpx.BillingAddress1 <> ''
THEN ISNULL(xpx.BillingAddress2,'')
ELSE ISNULL(xpx.AddressLine2,'')
END
, NY = CASE WHEN xpx.BillingAddress1 <> ''
THEN [By] ELSE xpx.City
END
, CODESTTE = CASE WHEN xpx.BillingAddress1 <> ''
THEN BillingState
ELSE xpx.State
END
, CODEPSTL = CASE WHEN xpx.BillingAddress1 <> ''
THEN BillingZip
ELSE LEFT(xpx.Zip, 5)
END
, NAMECTAC = ISNULL(xpx.ContactName,'')
, TEXTPHON1 = ISNULL(xpx.Phone1,'')
, TEXTPHON2 = ISNULL(xpx.Fax,'')
, EMAIL1 = ISNULL(xpx.PEmail,'')
, WEBSITE = xpx.PID
, CTACPHONE = ISNULL(xpx.Phone1,'')
, CTACFAX = ISNULL(xpx.Fax,'')

                   FROM
                      LinkedServer.DBNAME.dbo.AEAxjn A
                    JOIN
                      CCMX.dbo.xpx xpx on A.IdCt = xpx.CNo
                    WHERE
                      xpx.ChId = 1
                    AND
                      LEN(xpx.CNo) = 4
                    AND
                      xpx.BAct = 1
                    AND
                      A.IdGrp = 'LLJ'
                    AND
                      xpx.CNo IN
                      (
                        SELECT
                          CNo
                        FROM
                          CCMX.dbo.xpx
                        GROUP BY
                          CNo
                        HAVING
                          COUNT(*) = 1 --5000 rows
                      );

yosiasz · January 3, 2017, 4:59pm

question. You do a join as follows

JOIN
                     CCMX.dbo.xpx xpx on A.IdCt = xpx.CNo

but then you also do a subquery in the same table you joined to above. why?

xpx.CNo IN
                      (
                        SELECT
                          CNo
                        FROM
                          CCMX.dbo.xpx
                        GROUP BY
                          CNo
                        HAVING
                          COUNT(*) = 1 --5000 rows
                      );

yosiasz · January 3, 2017, 5:03pm

How many rows do you get when you do the following

select count(*) 
from LinkedServer.DBNAME.dbo.AEAxjn A
where A.IdGrp = 'LLJ'

jeffw8713 · January 3, 2017, 6:43pm

This is one of the nasty surprises you can get when using linked servers. In this case - SQL Server cannot determine the fastest method for getting the data because of the join across the servers - so instead it is doing the job row by row (hence the FETCH API_CURSOR waits.

To fix this you should put your procedure on the destination server so you are updating data locally. Then - instead of joining directly across the linked server - pull that data across the linked server and join to the local table. Now, whether or not you use a temp table - or a permanent table needs to be determined also...if you are using that data for other processes and do not need up to the minute data then you can build a permanent table and build a separate process to update that table on a scheduled (usually daily works).

If this is the only process requiring that data - then in your procedure pull the data into a temp table and join to the temp table.

Note: if you refresh the data on a daily basis it will be much better to use SSIS, BULK INSERT or BCP to refresh the data instead of a linked server. In all three of these there are options available to control the batch and commit sizes so you don't bloat your transaction logs and the load processes will take less time.

dmaxj · January 3, 2017, 9:22pm

Yosiasz - I get 2818 records for this query... thanks

yosiasz · January 3, 2017, 9:27pm

what data type is column IdCt. is it int or varchar? and does it allow null values?

dmaxj · January 3, 2017, 9:29pm

Jeff, we have the same train of thought

However, the query is a part of an overnight job that runs from the source server. I have considered a CTE or temp table... I am testing, but moving the code to run on the target server is my last resort, due to all of the other dominoes that would fall related to moving the code...

Also, yosiasz: the sub-select is in the query so that records are excluded when more than one CNo exists in the xpx table... thanks

dmaxj · January 3, 2017, 9:45pm

IdCt is a PK with a clustered index of type CHAR(12) - (NO NULLs allowed)
xpx.CNo (NULLS allowed) is a VARCHAR(10) column with a nonclustered index on this column too...

thanks

yosiasz · January 3, 2017, 10:20pm

try the following. sometimes it is best to break things down in batches to get some performance improvement and troubleshoot. You add this temp table creation in your stored procedure. if you want you can create a sample sproc to test it out, if you do not want to mess yoru current sproc and/or you do not have a pre-prod environment loaded with same data.

create table #tmp_AEAxjn(IdCt CHAR(12) not null primary key)

insert into #tmp_AEAxjn
select distinct IdCt
 from LinkedServer.DBNAME.dbo.AEAxjn A
where A.IdGrp = 'LLJ'

UPDATE A
SET TEXTSNAM = CAST(xpx.pxp as CHAR(10))
, NAMECUST = xpx.pxp
, TEXTSTRE1 = CASE WHEN xpx.BillingAddress1 <> '' 
THEN BillingAddress1 
ELSE xpx.AddressLine1 
END
, TEXTSTRE2 = CASE WHEN xpx.BillingAddress1 <> '' 
THEN ISNULL(xpx.BillingAddress2,'') 
ELSE ISNULL(xpx.AddressLine2,'') 
END
, NY = CASE WHEN xpx.BillingAddress1 <> '' 
THEN [By] ELSE xpx.City 
END
, CODESTTE = CASE WHEN xpx.BillingAddress1 <> '' 
THEN BillingState 
ELSE xpx.State 
END
, CODEPSTL = CASE WHEN xpx.BillingAddress1 <> '' 
THEN BillingZip 
ELSE LEFT(xpx.Zip, 5) 
END
, NAMECTAC = ISNULL(xpx.ContactName,'')
, TEXTPHON1 = ISNULL(xpx.Phone1,'')
, TEXTPHON2 = ISNULL(xpx.Fax,'')
, EMAIL1 = ISNULL(xpx.PEmail,'')
, WEBSITE = xpx.PID
, CTACPHONE = ISNULL(xpx.Phone1,'')
, CTACFAX = ISNULL(xpx.Fax,'')

FROM CCMX.dbo.xpx xpx  
WHERE
    xpx.ChId = 1
AND
    LEN(xpx.CNo) = 4
AND
    xpx.BAct = 1
AND
    xpx.CNo IN
    (
    SELECT
        CNo
    FROM
        CCMX.dbo.xpx
    GROUP BY
        CNo
    HAVING
        COUNT(*) = 1 --5000 rows
    )
and exists (select 1 from #tmp_AEAxjn src where src.IdCt = xpx.CNo)	;

dmaxj · January 4, 2017, 6:15pm

So, I thought about the question about datatypes and decided to make the change to convert varchar to char in the UPDATE statement... the job step went from ~2 hours to 6 secs... I will still have to dig into my trace to see what went on during the 6 secs execution. I guess that I took it for granted that char and varchar would not make a difference, but I was wrong!

Thanks to everyone for offering their input and assistance... I will watch this for a few more days, just to make sure that the code change sticks.

Regards!

yosiasz · January 4, 2017, 6:28pm

Are you sure it worked in 6 seconds or it failed in 6 seconds? Because below code does not compile properly.

CCMX.dbo.xpx xpx on A.IdCt = CONVERT(CHAR12), xpx.CNo)

dmaxj · January 4, 2017, 7:35pm

Oops- I mistyped -

CCMX.dbo.xpx xpx on A.IdCt = CONVERT(CHAR(12), xpx.CNo)

But yes, the job step completed successfully - I checked the agent log, plus no one is yelling

dmaxj · January 5, 2017, 4:25pm

An "odd" thing on this morning run of the job step - it took 45 mins to run... SMH... I will have to check to see if there was any blocking... From 2+ hours to 6 secs to 45 mins ???

yosiasz, I think that I will have to explore the code that you posted... thanks

yosiasz · January 5, 2017, 4:50pm

I guess someone is yelling? please have a read of this. Execution Plan Caching and Reuse

https://technet.microsoft.com/en-us/library/ms181055(v=sql.105).aspx

dmaxj · January 5, 2017, 10:02pm

Well, no one yelled at me, but I did some yelling... I was totally surprised by the 45 min execution time. I took a look at the link that you posted. I can only conclude that my daily statistics update could be the problem...
It's hard to tell at this moment... My trace didn't show any blocking at the time that the code was executing.... I checked my WaitsReport and noticed that the ASYNC_IO_COMPLETION wait type was nonexistent on the day that the code executed in 6 secs, but that ASYNC_IO_COMPLETION wait type was at 11755 secs (195 mins) on today's run...
I am pretty sure that this process is covered in todays wait stats... The question is 'what was it' - I am beginning to think that the target linked server may need more attention, too....

I am going to look at implementing the code that you posted also...