What happens when you join an indexed table to a non indexed linked server table

sapator · April 5, 2018, 8:46am

Hi.
I have an indexed local table and I am thinking of joining it to a linked server table since I do not have the information I need to the local table.
Either that or create a query that brings data to a local stage table from linked server and the join that local table. This will need maintenance however because is not a day to day table join so I do not know what data I might use and also the date range. So the stage table could well become huge.

I'm thinking of the complication this join will produce.
I will try to use filter to the linked table so I can avoid full data read and I am using indexed columns on local table. The problem is that some of the linked table joined columns are not indexed.
What are the complications that may be created in that case? Will the local table indexes be used or since the are missing from linked table, be ignored?
Also I am using read uncommitted transactions. Is this actually working when using linked tables?
If the linked table is slowing the select down, does this mean that the local database is also slowed and not using uncommitted reads?
Any other tips you might think to make performance better?
thanks.

jeffw8713 · April 5, 2018, 3:29pm

Joining across a linked server can and will cause performance issues. Based on what you have stated - it is entirely possible that SQL Server will determine that executing a cursor over the linked server is the better option.

That would set this up to pull one row across the linked server - validate the row - pull the next row, etc... until all data from the linked server has been processed.

SQL Server may also decide to just pull all the data from the linked server to a local worktable - then perform the join on the local server.

Or - you may get lucky and SQL Server can filter out the expected data from the linked server and only pull across the data that is needed.

The better option will be to setup a scheduled process to populate a local table with the data you need - and only the columns and rows that you need. You can do this through a linked server but be aware that you cannot control the batch/commit sizes and may blow out your transaction log. If there is a lot of data you would be better off building an SSIS package and scheduling that...

If the requirements change - then you would have to change the extract/load process. There is no reason to avoid that process because you don't know today what the requirements will be tomorrow. If you don't need certain columns today for your reporting requirements - then pulling them across would just be a waste of time and effort.

sapator · April 10, 2018, 8:00am

Thanks.