Selecting people in groups of 3 with unique nationality

harlingtonthewizard · December 18, 2019, 5:11am

Hi,

Selecting people in groups of 3 with unique nationality, so that each group cannot have people with the same nationality.

With the following example dataset:

INSERT INTO StudyOverSeas (id, name, nationality)
VALUES ('1234567', 'steve', 'Australia'),
('1234568', 'stuart', 'Japan'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'China'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa')

I am looking for an output that looks like this:

id, name, nationality, group

1234567, steve, Australia, 1
1234568, stuart, Japan, 1
1234561, heather, China, 1
1234569, alix, Australia, 2
1234562', john, Sweden, 2
1234563, julie, Africa, 2
1234564, tracy, Africa, 3

id is unique and it doesn't matter what order they are in, for example here is another example:

1234567, steve, Australia, 1
1234568, stuart, Japan, 1
1234563, julie, Africa, 1
1234564, tracy, Africa, 2
1234562', john, Sweden, 2
1234561, heather, China, 2
1234569, alix, Australia, 3

Regards

Steve

Wim_Leys · December 18, 2019, 9:42pm

Hi Harish,

Have you tried your query with other data? Give it a try with this:

INSERT INTO StudyOverSeas (id, name, nationality)
VALUES ('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa')
go

Th result set does not conform to the requirement.

Selecting people in groups of 3 with unique nationality, so that each group cannot have people with the same nationality.

I was curious about your solution, cuz I couldn't think one right away.

ScottPletcher · December 18, 2019, 10:14pm

Agreed, this is much more complex than it first appears.

@harlingtonthewizard / Steve

How much total data do you have for this? I think a recursion of some type might work, but the performance would not be good for large amounts of data.

ScottPletcher · December 18, 2019, 10:54pm

For now, I'd use a loop on this, to avoid too complex a logic in a strict relational solution. I wasn't sure how you wanted to handle left-over rows, so I just list them at the end. Btw, I did test my solution against Wim_Leys' well-considered data, and it worked there too.

Note that this code copies the original table to a temp table so that rows can be deleted as they are matched.

DECLARE @ID int
DECLARE @ID2 int
DECLARE @ID3 int
DECLARE @name varchar(30)
DECLARE @name2 varchar(30)
DECLARE @name3 varchar(30)
DECLARE @nationality varchar(30)
DECLARE @nationality2 varchar(30)
DECLARE @nationality3 varchar(30)
DECLARE @group# int

IF OBJECT_ID('tempdb.dbo.#StudyOverSeas') IS NOT NULL
    DROP TABLE #StudyOverSeas
SELECT TOP (0) *
INTO #StudyOverSeas
FROM dbo.StudyOverSeas
EXEC('CREATE UNIQUE NONCLUSTERED INDEX IX1 ON #StudyOverSeas ( ID )')

INSERT INTO #StudyOverSeas WITH (TABLOCK)
SELECT *
FROM dbo.StudyOverSeas
ORDER BY NEWID()

IF OBJECT_ID('tempdb.dbo.#Results') IS NOT NULL
    DROP TABLE #Results
SELECT TOP (0) ID, name, nationality, CAST(NULL AS int) AS group#
INTO #Results
FROM #StudyOverSeas

SET @group# = 0

WHILE 1 = 1
BEGIN
    SELECT @ID = NULL, @ID2 = NULL, @ID3 = NULL
    SELECT TOP (1) @ID = ID, @name = name, @nationality = nationality
    FROM #StudyOverSeas
    IF @ID IS NULL
        BREAK;
    SELECT TOP (1) @ID2 = ID, @name2 = name, @nationality2 = nationality
    FROM #StudyOverSeas
    WHERE nationality NOT IN (@nationality)
    IF @ID2 IS NULL
        BREAK;
    SELECT TOP (1) @ID3 = ID, @name3 = name, @nationality3 = nationality
    FROM #StudyOverSeas
    WHERE nationality NOT IN (@nationality, @nationality2)
    IF @ID3 IS NULL
        BREAK;
    SET @group# = @group# + 1
    INSERT INTO #Results VALUES
        (@ID,  @name,  @nationality,  @group#),
        (@ID2, @name2, @nationality2, @group#),
        (@ID3, @name3, @nationality3, @group#)
    DELETE FROM #StudyOverSeas
    WHERE ID IN (@ID, @ID2, @ID3)
END /*WHILE*/

SELECT *
FROM #Results
ORDER BY group#, ID

SELECT 'LeftOverRows', *
FROM #StudyOverSeas

Wim_Leys · December 19, 2019, 1:30am

It's great that Scott came with a solution.
I tried a recursive solution. But it's broken somewhere. Beefed it up to recursion depth of 1000 and it still complains. And that with a mere 7 records.

The statement terminated. The maximum recursion 1000 has been exhausted before statement completion.

Click for sample data and table

drop table if exists StudyOverSeas;

create table StudyOverSeas(
	id		INT	NOT NULL	PRIMARY KEY,
	name	varchar(100)	NOT NULL,
	nationality	varchar(100)	NOT NULL
);

INSERT INTO StudyOverSeas (id, name, nationality)VALUES 
(1, 'steve', 'Australia'),
(2, 'stuart', 'Japan'),
(3, 'alix', 'Australia'),
(4, 'heather', 'China')
,(5, 'john', 'Sweden')
,(6, 'julie', 'Africa')
,(7, 'tracy', 'Africa');

;WITH CTE AS (--this CTE returns all unique combinations of people and their nations. 
-- By unique, we mean if the combination "John, Mary, Steve" occurs, all other possible 
-- combinations of those three, like "John, Steve, Mary" are not part of the result set
SELECT SO1.id as id1, SO1.name as Name1, SO1.nationality as Nat1,
	SO2.id as id2, SO2.name as Name2, SO2.nationality as Nat2,
	SO3.id as id3, SO3.name as Name3, SO3.nationality as Nat3
FROM StudyOverSeas as SO1
	LEFT OUTER JOIN StudyOverSeas as SO2
		ON SO1.nationality <> SO2.nationality
		AND SO1.id < SO2.id
	LEFT OUTER JOIN StudyOverSeas as SO3
		ON SO1.nationality <> SO3.nationality
		AND SO2.nationality <> SO3.nationality
		AND SO2.id < SO3.id
),
CTE2 AS (--add a unique groupNr to each combination
SELECT *, ROW_NUMBER() OVER (ORDER BY id1) as GroupNr 
FROM CTE
)
,RecursiveCTE AS(--start with each unique 3-people combination as the start of a new group
SELECT id1, Name1, Nat1, id2, Name2, Nat2, id3, Name3, Nat3, GroupNr
FROM CTE2
UNION ALL
SELECT CTE2.id1, CTE2.Name1, CTE2.Nat1, CTE2.id2, CTE2.Name2, CTE2.Nat2, CTE2.id3, CTE2.Name3, CTE2.Nat3, RCTE.GroupNr
FROM RecursiveCTE AS RCTE
	INNER JOIN CTE2	--find another combination of 3 "new" people of whom nobody is present in the group so far
		ON RCTE.id1 <> CTE2.id1
		AND RCTE.id2 <> CTE2.id1
		AND RCTE.id3 <> CTE2.id1
		AND RCTE.id1 <> CTE2.id2
		AND RCTE.id2 <> CTE2.id2
		AND RCTE.id3 <> CTE2.id2
		AND RCTE.id1 <> CTE2.id3
		AND RCTE.id2 <> CTE2.id3
		AND RCTE.id3 <> CTE2.id3
)
SELECT *
FROM RecursiveCTE
ORDER BY GroupNr, 1, 4, 7
--option ( MaxRecursion 1000 );

The query is definitely broken. I'm not even sure the logic behind it is sane.
I'm curious how a recursive solution would be like.

Bedtime here.

harishgg1 · December 19, 2019, 6:09am

Hi Wim

Thanks for pointing it out !!!

I noticed what you are saying now !!!
Looks a lot more complicated!!!

I will try to see if i can solve it !!

Thanks

harishgg1 · December 19, 2019, 7:24am

hi Wim

According to halringtonthewizard who is the poster
Since the order of the data does not matter !!!!

We can create row number for each country
and group them together

please click arrow to the left for drop create sample data ...

drop table StudyOverSeas
go 

create table StudyOverSeas
(
id varchar(100),
name varchar(100),
nationality varchar(100)
)
go 


INSERT INTO StudyOverSeas (id, name, nationality)
VALUES 
('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa'),
('1234570', 'Pam', 'Australia'),
('1234570', 'Sam', 'Sweden'),
('1234570', 'Sue', 'Australia')
go

select 'data',* from StudyOverSeas
go

; with cte as 
(
select ROW_NUMBER() over(partition by nationality order by nationality) as rn 
          , *
from StudyOverSeas
)
select * from cte order by rn 
go

Wim_Leys · December 19, 2019, 9:44am

Hi Harish,

You nailed it.
What an elegant and efficient solution!

Wim_Leys · December 19, 2019, 9:56am

Hi Harish,

I got bad news.
Your solution doesn't work with the following data set.

INSERT INTO StudyOverSeas (id, name, nationality)
VALUES 
('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa'),
('1234565', 'Pam', 'Australia'),
('1234566', 'Sam', 'Sweden'),
('1234567', 'Ukuele', 'UK'),
('1234568', 'Ursula', 'USA'),
('1234569', 'Max', 'Mexico'),
('1234570', 'Bill', 'Brasil')
go

I think a combination of your first solution (chop result set in groups of 3 records) and your last one will work.

harlingtonthewizard · December 19, 2019, 10:37am

Thanks @ScottPletcher I had something very similar but clearly had a bug in my code. There's only a few thousand rows of data and performance is not an issue as such.

harishgg1 · December 19, 2019, 12:23pm

hi Wim

I tried with 1st way and 2nd way combined .. but ran into roadblocks !!!!

I tried something else ...

please click arrow to the left for drop create data ...

drop table StudyOverSeas
go 

create table StudyOverSeas
(
id varchar(100),
name varchar(100),
nationality varchar(100)
)
go 


INSERT INTO StudyOverSeas (id, name, nationality)
VALUES 
('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa'),
('1234565', 'Pam', 'Australia'),
('1234566', 'Sam', 'Sweden'),
('1234567', 'Ukuele', 'UK'),
('1234568', 'Ursula', 'USA'),
('1234569', 'Max', 'Mexico'),
('1234570', 'Bill', 'Brasil')
go

select 'data',* from StudyOverSeas
go

I was able to create groups of 3
but in a row ... instead of in a column

please click arrow to the left for SQL ...

;with cte as 
(
 select 
      ROW_NUMBER() over(order by (select null)) as rn
      ,* 
 from StudyOverSeas
) 
select 'SQL 3 in a Row Way',
       a.id as aid,a.name as aname,a.nationality as anationality,
       b.id as bid,b.name as bname,b.nationality as bnationality,
       c.id as cid,c.name as cname,c.nationality as cnationality
from cte a 
          join cte b 
              on a.rn = b.rn + 1 
          join cte c 
              on a.rn = c.rn + 2 
                   and a.nationality <> b.nationality 
                   and b.nationality <> c.nationality
                   and a.nationality <> c.nationality
go

Wim_Leys · December 19, 2019, 1:06pm

Hi Harish,

I don't think this solution is correct.

a.rn = b.rn + 1
and
a.rn = c.rn + 2

are far too strict. You can not assume that the row numbers of the people are magically ordered in the correct manner. (I doubt they can be after further tweaking the PARTITION BY and/or ORDER BY part of ROW_NUMBER().)

I think you will have to replace them by something like
a.rn < b.rn
a.rn < c.rn
or
a.rn <> b.rn
a.rn <> c.rn
(I don't have access to a db atm).

Also the INNER JOINS will have to be replaced by LEFT OUTER JOINs. For those cases when there are far more people of one nation.

yosiasz · December 19, 2019, 1:51pm

What real world problem are you trying to solve here?

harishgg1 · December 19, 2019, 2:59pm

Hi Wim

I tried recursive cte and came up with a solution ...
Please have a look ...

Jeff Moden
.. Please forgive me !!!!
i committed the ultimate sin using Recursive CTE however small
How about demo purpose As a Reason !!!

please click arrow to the left for Drop Create Data ..

drop table StudyOverSeas
go 

create table StudyOverSeas
(
id varchar(100),
name varchar(100),
nationality varchar(100)
)
go 


INSERT INTO StudyOverSeas (id, name, nationality)
VALUES 
('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa'),
('1234565', 'Pam', 'Australia'),
('1234566', 'Sam', 'Sweden'),
('1234567', 'Ukuele', 'UK'),
('1234568', 'Ursula', 'USA'),
('1234569', 'Max', 'Mexico'),
('1234570', 'Bill', 'Brasil')
go


select 'data',* from StudyOverSeas
go

please click arrow to the left for SQL ...

; with rn_cte  as 
( 
	select ROW_NUMBER() over(order by (select null)) as rn 
		   , * 
	from StudyOverSeas
) , rec_cte as 
(
	select 0 as grp 
		   , * 
	from rn_cte 
	where rn = 1 
		union all 
	select case when a.nationality <> b.nationality then b.grp +1 else b.grp end 
	      ,a.* 
	from rn_cte a 
	      join rec_cte b 
	         on a.rn = b.rn + 1
) , rn1_cte as 
(
	select row_number() over(order by  grp%3)-1 as rn1 
		   , * 
	from rec_cte  
) 
select 'SQL Output'
	   ,rn1%3+1
       ,id
       ,name
       ,nationality 
from rn1_cte

ScottPletcher · December 19, 2019, 3:20pm

Let me also point out this statement in my code. It means that you get random matches for each run. I was afraid that with a ROW_NUMBER() approach, you would tend to get the same results for a given input set. I thought it more likely you wanted random matches, but that was just a guess, of course.

ScottPletcher · December 19, 2019, 3:40pm

You've exceeded 3 people in each group. I think there might be a way to do it recursively, but I really think it would be much more overhead and complexity than just looping thru the rows. The loop code is thus much easier to verify and troubleshoot.

harishgg1 · December 19, 2019, 3:42pm

i am having a lot of free time Scott

killing time !!!

As you very rightly said ...
Looping seems to be best Way !!!

I am trying though ( breaking my head for many hours ) ..

Wim_Leys · December 19, 2019, 4:23pm

Hi Harsih,

I agree 100% with Scott.

I considered finding a recursive solution as a nice brain teaser and challenge to sharpen my SQL skills.
But I've already spend way too much time on this one.
Hope you will find a non-cursor solution though. Every time your solutions are so close to a breakthrough.
I'm curious what you will come up with next time.

harishgg1 · December 19, 2019, 4:31pm

Hi Wim

It requires THINKING SKILLS ..

I could do it really quickly if I applied thinking skills
but i am NOT doing it quickly

i am taking my time ... enjoying my ups and downs

i mean in the sense WRITING down on Paper , analyzing , Notes at every point
what you are trying to make sense ..

harlingtonthewizard · December 19, 2019, 8:24pm

Thanks @ScottPletcher, it doesn't matter if the outcome is the same or random for a given data set, although I can see some advantage of random. The real world problem is to buddy students from different countries in groups of three.