Inner join on 2 identical tables on 2 different servers causing cartisian product. If distinct used omitting samt amount records

We have 2 identical tables on 2 different servers. Details as below

For table Sales on first server

InvoiceNo Date Account Amount
1 20-04-2022 A 10
2 20-04-2022 B 10
3 20-04-2022 C 20

For table Sales on second server

InvoiceNo Date Account Amount
1 20-04-2022 A 10
2 20-04-2022 B 10

Output needed as below

CountA SumofAmtA CountB SumofAmountB
3 40 2 20

To check if records have synced correctly from server A to server B;I use inner join on invoice number; which causes cartesian product; however to overcome it if I use distinct of (Invoice); count comes correct; however distinct of Amount causes the 2nd bill's amount to be not considered i.e. same amount bills are left out. I need to compare count of Invoices and sum of Amount of each server to be compared. Thanks in advance.

Would be helpful if you posted the actual query you are using.

Are these linked servers, accessed using four-part naming convention for the remote server?

Since you just want summary results, do the COUNT()s and the SUM()s before joining the tables.


;WITH cte_sales_a AS (
    SELECT COUNT(*) AS CountA, SUM(Amount) AS SumOfAmtA, 
        MAX(InvoiceNo) AS InvoiceNoMaxA
    FROM dbo.sales
),
cte_sales_b AS (
    SELECT COUNT(*) AS CountB, SUM(Amount) AS SumOfAmtB, 
        MAX(InvoiceNo) AS InvoiceNoMaxB
    FROM remoteserver.remotedb.dbo.sales
)
SELECT a.*, b.*
FROM cte_sales_a a
CROSS JOIN cte_sales_b b
2 Likes

Hi ScottPletcher,

Thanks for your revert.

Hi James,

Yes they are linked servers.

Query as below

select count(distinct(a.InvoiceNo)),Sum(distinct(a.Amount),count(distinct(b.InvoiceNo)),Sum(distinct(b.amount)
from Sales A inner join remoteserver.remotedatabase.dbo.Sales b
on a.InvoiceNo=b.InvoiceNo

On doing this; it gets correct count of invoice numbers on both servers; however if there are 2 invoices by same amount it skips one. If I do not give distinct in amount; it sums up 'amount' column; but makes cartesian product of sum of amount of first table with second; which is actually incorrect sum. Please guide.

Please consider below query; sorry for typo error; 2nd bracket after amount was missing.

select count(distinct(a.InvoiceNo)),Sum(distinct(a.Amount)),count(distinct(b.InvoiceNo)),Sum(distinct(b.amount))
from Sales A inner join remoteserver.remotedatabase.dbo.Sales b
on a.InvoiceNo=b.InvoiceNo

On doing this; it gets correct count of invoice numbers on both servers; however if there are 2 invoices by same amount it skips one. If I do not give distinct in amount; it sums up 'amount' column; but makes cartesian product of sum of amount of first table with second; which is actually incorrect sum. Please guide.

It is to be expected that if you use distinct in SUM, and if there are two invoices with the same amount, it will be included only once.

But, if you don't include DISTINCT in SUM, I don't know how that can result in a cartesian product. See the example below.

DROP TABLE IF EXISTS #A;
DROP TABLE IF EXISTS #B;

CREATE TABLE #A(invoiceNo INT, amount INT);
CREATE TABLE #B(invoiceNo INT, amount INT);

INSERT INTO #A VALUES (1,10),(2,20), (3,10),(4,25);
INSERT INTO #B VALUES (1,10),(2,20), (3,10),(5,30);



SELECT
    COUNT(DISTINCT (A.InvoiceNo)),
    SUM((A.Amount)),
    COUNT(DISTINCT (b.InvoiceNo)),
    SUM((b.amount))
FROM
    #A a
    INNER JOIN #B b ON 
        A.InvoiceNo = b.InvoiceNo;

Hi JamesK,
Thanks for your revert. In above script; only the common invoice numbers will be counted and summed; need is to have count of all distinct invoices and sum of amount of "these distinct" invoices only. Example as per below as already stated.

We have 2 identical tables on 2 different servers. Details as below

For table Sales on first server

InvoiceNo Date Account Amount
1 20-04-2022 A 10
2 20-04-2022 B 10
3 20-04-2022 C 20

For table Sales on second server

InvoiceNo Date Account Amount
1 20-04-2022 A 10
2 20-04-2022 B 10

Output needed as below

CountA SumofAmtA CountB SumofAmountB
3 40 2 20

Thanks in advance.

more of a design question that I wanted to ask that should be thought about? why do you have 2 identical tables on 2 different servers with differing data? Are you going in the direction of finding the difference in order to move forward with going with one source of truth or is this a continuous thing you will be doing on some cadence?

Hi,

Thank you for revert. We need to monitor transfer of data from source server to destination server; so it is needed.

Try using a union query in the from clause. It will eliminate duplicates.