SQLTeam.com | Weblogs | Forums

Finding duplicate groups of records


#1

I have a product structure table and I'm trying to find items that have duplicate product structures.

Table: bmprdstr

item_no --------------- component Item
ABC ---------------------cmp1
ABC ---------------------cmp2
XYZ---------------------cmp1
XYZ----------------------cmp2
EFG----------------------cmp1
EFG-----------------------cmp2
EFG-------------------------cmp3

So what I'm looking for is some way to know that ABC and XYZ have the same product structure.
EFG would not have the same because there is a 3rd component.



#2

Oh, and does anyone know how to add spaces or tab's between words?


#3
;WITH cte_item_components AS (
    SELECT item_no, STUFF(
        (SELECT ',' + [component Item]
         FROM bmprdstr b2
         WHERE b2.item_no = b1.item_no
         ORDER BY [component Item]
         FOR XML PATH('')
        ), 1, 1, '') AS component_item_list
    FROM (
        SELECT DISTINCT item_no
        FROM bmprdstr 
    ) AS b1
)
SELECT *
FROM cte_item_components cic1
INNER JOIN cte_item_components cic2 ON cic2.component_item_list = cic1.component_item_list AND cic2.item_no > cic1.item_no

#4

I put that into a view where I now have the item_no and component_item_list as two fields.

There are hundreds of components in some of the BOM's.

Now is there a way to list the item_no 's that have similar component_item_Lists.

I can select count(component_item_list)
from complist
group by comp_item_list
having count(component_item_list)>1

But I can't seem to find a way to show the item numbers that have the duplicates.


#5

That's what the final SELECT in my query above does -- it joins the combined component rows and lists all matches across item_nos, but only once for each match. That is, if item #1 matches item#11, the query will list:
1 11
but it won't also list:
11 1
as that would be redundant.


#6

So in the pic below. 55988 has the same components as 000-100?


#7

My code puts the first item_no and its list before listing the matching item_no.

I should have put both item_nos first and only listed one of the component lists, since they're known to be the same anyway.

;WITH cte_item_components AS (
    SELECT item_no, STUFF(
        (SELECT ',' + [component Item]
         FROM bmprdstr b2
         WHERE b2.item_no = b1.item_no
         ORDER BY [component Item]
         FOR XML PATH('')
        ), 1, 1, '') AS component_item_list
    FROM (
        SELECT DISTINCT item_no
        FROM bmprdstr 
    ) AS b1
)
SELECT cic1.item_no, cic2.item_no, cic1.component_item_list
FROM cte_item_components cic1
INNER JOIN cte_item_components cic2 ON cic2.component_item_list = cic1.component_item_list AND cic2.item_no > cic1.item_no

#8

Get a copy of my trees and hierarchies in SQL book. I have material on BOM models. It is actually very easy in the nested set model to see if 2 sub-assemblies have identical structure and components. Without posting a few hundred words of code, the basic idea is to take the root of subtree one and the root of subtree two, put their (lft, rgt) pairs into a canonical form. Basically that means, make sure they are all numbered the same way; both subtrees are numbered from 0 or 1 to in by subtracting the minimum value in the subtree. This minimum will be the lfts value of the roots. If both subtrees are identical, then doing an EXCEPT on them will give you an empty result set.

Think sets, not procedures.