I bumped into a problem that in my opinion is really interesting. I found a relatively easy itinerary solution, but it completely lacks performance when applied to a bigger stack of records... and in the end it turned out to be even wrong!
Imagine a list of names with an underlying, BUT NOT KNOWN order: (we make it visible to make it easier to follow )
Peter, Paul, John, George, Mary, Anne, Eric, Thomas, Susan, Liz
The only thing you have are samples with subsets of those names, listed in the correct hierarchical order.
sample1: John, Anne, Susan
sample2: George, Thomas, Susan, Liz
sample3: Peter, Paul, Eric, Liz
sample4: Paul, Susan
sample5: Mary, Eric, Susan
sample6: Paul, George, Mary, Anne, Thomas
sample7: Paul, John, Mary, Eric
The goal is to find the correct order; or at least a possible correct order for the amount of given samples. Just imagine that there will be 400 names and 500'000 samples.
I post what I thought was a solution: Taking the average ranking within the samples as a measure. This will for sure bring #1 in Top position but lower ranked names will be calculated wrong. You'll need a splitter function that returns the name and its position. like this one -> http://www.sqlservercentral.com/articles/Tally+Table/72993/
Create TABLE #Sample
(
ID int,
Sample varchar(100)
)
INSERT into #Sample (ID,Sample) values (1,'John,Anne,Susan'),(2,'George,Thomas,Susan,Liz'), (3,'Peter,Paul,Eric,Liz'),(4,'Paul,Susan'),(5,'Mary,Eric,Susan'),(6,'Paul,George,Mary,Anne,Thomas'), (7,'Paul,John,Mary,Eric')
Create TABLE #Names
(
ID int,
Name varchar(10),
rank int
)
Insert into #Names
(
ID,
Name,
rank
)
SELECT ID, Item, ItemNumber
FROM #Sample
CROSS APPLY
(
SELECT Item, Itemnumber FROM dbo.Split(Sample, ',')
) AS Y
select * from #Names
select name, 1.0*sum(rank)/count(rank) as ord
from #Names
Group by name
order by ord asc
drop Table #Sample
drop Table #Names