SQLTeam.com | Weblogs | Forums

Struggling to remove duplicates


#1

Hello,
I have a table of data. I want to return distinct order id's and the highest item id associated with the order. With the example table attached, I would want to return 3 records with item ids 7202, 7205, 7208. I then want to use the results to join to another table. Basically, I want the highest item id associated with the order id.
Example Table

Auto Id Order Id Item Id
1 100 7200
2 100 7201
3 100 7202
4 101 7203
5 101 7204
6 101 7205
7 102 7206
8 102 7207
9 102 7208

Result I want

Auto Id Order Id Item Id
3 100 7202
6 101 7205
9 102 7208

And then with the results
Select Auto Id, Order Id, Item Id
FROM Table A
INNER JOIN Table B
ON A.Auto ID = B. Auto Id

Any help would be much appreciated.


#2

See if something like this works for you.

/* Setup Test Data Table */
DECLARE @MyTable TABLE
(
	AutoId INT NOT NULL PRIMARY KEY,
	OrderId INT NOT NULL,
	ItemId INT NOT NULL,
	UNIQUE (OrderId, ItemId)
);

/* Insert into Test Data Table */
INSERT INTO @MyTable (AutoId, OrderId, ItemId)
SELECT * 
  FROM 
	(
		VALUES 
		(1, 100, 7200),
		(2, 100, 7201),
		(3, 100, 7202),
		(4, 101, 7203),
		(5, 101, 7204),
		(6, 101, 7205),
		(7, 102, 7206),
		(8, 102, 7207),
		(9, 102, 7208)
	) AS MyList (AutoId, OrderId, ItemId);

/* Generate RN to remove duplicates */
;WITH cteNoDuplicates As
(
	SELECT mt.AutoId, OrderId, ItemId, ROW_NUMBER() OVER(PARTITION BY OrderId ORDER BY ItemId DESC) As RN
	  FROM @MyTable As mt
)
SELECT d.AutoId, d.OrderId, d.ItemId
  FROM cteNoDuplicates As d
 WHERE d.RN = 1;

#3

Thanks James. Although this removes the duplicates it is not returning the 3 records I want. This will return item id 7200, 7203 and 7206. It is giving me the first result in the partition when what I need is the last result in the partition.
Can you tweak to get the desired result?


#4

The query provided by @James does exactly as you requested (did you forget to the "desc" in order by clause?)

AutoId  OrderId ItemId
3	100	7202
6	101	7205
9	102	7208

#5

Thanks got it


#6

Hi

Here's another way to do it

Don't know if its better or not

Query

;WITH abc_cte
AS (SELECT orderid,
Max(itemid) AS max
FROM @MyTable
GROUP BY orderid)
SELECT b.*
FROM abc_cte
JOIN @MyTable b
ON abc_cte.orderid = b.orderid
AND abc_cte.max = b.itemid


#7

When I run my code and your code together your code is 18% of the batch cost while mine is 82%. Granted that is with few rows but it seems to me your code is far more efficient.

My code requires additional operators to run. Those are: Sort (very expensive), Sequence Project, and Filter.

Great suggestion!