SQLTeam.com | Weblogs | Forums

Join on columns creating too many instances (duplicates)


#1

JOIN ON COLUMNS CREATING TOO MANY INSTANCES (DUPLICATES)

I have an orders table that i want to display with some aggregate derived columns , and i want to count the instances a value shows up per customerid. One derived column [DOCKED] will have a join into another table via the customerid

TABLE dbo.orders --o
(
OrderID (pk, int, NOT NULL),
CustomerID varchar(5) NOT NULL,
Sourceid int NOT NULL,

Status varchar(50) null,
ordershipped datetime NULL,
orderarrived datetime NULL
CONSTRAINT PK_OrderID PRIMARY KEY CLUSTERED (OrderID ASC)
);

TABLE dbo.Ordermessages --om
(
OmID (pk,varchar(5), NOT NULL),
Sourceid int NOT NULL,
dockcode varchar(5),
dockdate datetime NULL,
location nvarchar(50) not null
CONSTRAINT PK_customerID PRIMARY KEY CLUSTERED (OmID ASC)
);

SELECT coalesce (Customerid,'Total') AS CUSTID

[READY] = sum (case when o.status = 'Confirmed' then 1 else 0 end)
,[DOCKED] = sum (case when o.status = 'Confirmed' and om.dockcode ='z' AND om.dockdate >= GETDATE() -1 then 1 else 0 end)

FROM Orders o
join Ordermessages om on o.sourceid = om.sourceid
where o.Customerid <> ''

GROUP BY ROLLUP (Customerid);

Thanksfor all help
S


#2

The [DOCKED] total should be fine.

For [READY], try this:

,[READY] = COUNT(DISTINCT case when o.status = 'Confirmed' then o.OrderID end)


#3

Thanks, but this creates a wrong calculation for the ROLLUP ; I need to find a way to join the two tables to get rid of the duplicates as there are many sourceids associated with customerids , some sort of condition with the ON clause for both sourceids or a subquery involving the JOIN
Regards
S


#4

It looks like you only have sourceid as the column to relate the tables - which doesn't make sense. The OrderMessage table should relate to the Order based on the OrderID - or the Order should have a FK to the OmID...

You cannot get rid of 'duplicates' unless you can identify why they are considered duplicate entries. If there are multiple order messages per source ID - which one do you want to keep? Once you have that defined you can use a CROSS APPLY to get the TOP order message.

CROSS APPLY (
     SELECT TOP 1
            *  -- {list the actual columns you want returned}
       FROM dbo.OrderMessages om
      WHERE om.SourceID = o.SourceID
        AND {other criteria}
      ORDER BY
            {sort so the row you want is on top}
            ) AS t