Find match in different server

razaf · June 26, 2023, 2:10am

Have records to match in 2 different sql servers.
Large number of records around 80,000 is there to
compare.

Took batch wise match and compared with in
clause .it gave parameter limit of in around 2000.

What is the ideal way to compare huge records in different server?

robert_volk · June 26, 2023, 4:28am

If this is a follow up to this topic:

It would help to know what column(s) are you JOINing or comparing between the local SQL instance and the remote/linked server.

If you only have 80 thousand rows on the linked server, and you're only comparing 1 column, or even up to 4 or 5 columns, it would be most efficient to simply insert that data from the linked server into a temp table on the local server. Index the column(s) in the temp table that you're going to JOIN to data in the local server.

CREATE TABLE #LinkedData(column1 int not null, column 2 int not null
--, PRIMARY KEY(column1, column2) WITH (IGNORE_DUP_KEY=ON) -- this is optional
)
INSERT INTO #LinkedData(column1,column2)
SELECT * FROM OPENQUERY([linkedServer], 'SELECT col1, col2 FROM myDatabase.mySchema.myTable')  -- OPENQUERY will generally perform better than 4-part references

SELECT *
FROM myLocalTable A
INNER JOIN #LinkedData Z ON A.col1=Z.column1  -- or however you'd join the local table to the temp table data

If you can provide the exact table schema DDL for both local and linked server tables, with some example data, and the expected results, it will make it easier to help you.

The PRIMARY KEY definition I commented out in the example would be useful to prevent duplicate rows from the linked server being populated in the temp table. If all you're trying to accomplish is "find me data on the local server that matches in the linked server", then that would be a useful optimization. Again, we'd need more specific details and examples to know for certain.

Ifor · June 26, 2023, 8:41am

It depends!

For an application another approach to consider, especially if a lot of the data is already cached, is joining the tables in memory on an application server. Two advantages for this are:

It keeps the load off the database servers. ie It is easier and cheaper to scale out application servers rather than scale up database servers.
It is more flexible if you think you will ever need to use something like Azure SQL Dababase.

eg With Entity Framework:

How to Join Two Tables From Different Databases Using Entity Framework (c-sharpcorner.com)

razaf · June 27, 2023, 1:09am

Thanks, can we use table variables instead of temp tables?
What is the capacity of temp variable yo hold records

robert_volk · June 27, 2023, 2:14pm

For 50K or more rows, that you're going to JOIN to another table, use temp tables instead, and index the JOINed columns.

Table variables do not generate statistics on columns, so the query optimizer will use a fixed row count/cardinality that will be incorrect. Temp tables will have statistics like any other table would.

Indexing will allow the optimizer to consider different JOIN types that would not be viable with an unindexed column.