Selecting people in groups of 3 with unique nationality

harlingtonthewizard · December 19, 2019, 8:28pm

Thank you @ScottPletcher, @harishgg1 and @Wim_Leys for your input. I think I was close with my code but it had a bug and was going my head in

ScottPletcher · December 19, 2019, 8:43pm

Yeah, this was an interesting and complex issue. I came up with several approaches before I settled on the one above.

I do think this ultimately could be done using recursion, but the performance would be truly hideous, so it's not worth the trouble to do.

yosiasz · December 19, 2019, 9:21pm

Similar (sort of) to @ScottPletcher

use sqlteam
go

create table dbo.StudyOverSeas(id varchar(50), 
name nvarchar(50), nationality nvarchar(50))
create table groups(groupid int, groupname varchar(50));


INSERT INTO StudyOverSeas (id, name, nationality)
VALUES 
('1234567', 'steve', 'Nigeria'),
('1234568', 'stuart', 'Nigeria'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Nigeria'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Nigeria'),
('1234564', 'tracy', 'Nigeria'),
('1234565', 'Pam', 'Australia'),
('1234566', 'Sam', 'Sweden'),
('1234567', 'Ukuele', 'UK'),
('1234568', 'Ursula', 'USA'),
('1234569', 'Max', 'Mexico'),
('1234570', 'Bill', 'Brasil')
go

/*
	create multi groups then assign one group of people 
to the other where it is not already assigned.
*/

declare @counter int, @jedi int = 1 ;

create table dbo.grouper(id varchar(50), name nvarchar(50), 
nationality nvarchar(50), groupid int)

;with src
as
(
  select count(nationality) _count
  from StudyOverSeas 
  group by nationality
)  
select @counter = max(_count)
from src;

WHILE (@jedi) <=   @counter
BEGIN  
	insert into dbo.groups(groupid, groupname)
	select distinct @jedi, nationality from StudyOverSeas;
	

   IF (@jedi = @counter)
		begin
			BREAK  
		end
   ELSE  
		begin
			insert into grouper
			select distinct top 3  so.id, so.name, so.nationality , g.groupid 
			  from dbo.StudyOverSeas so
			  join dbo.groups g on so.nationality = g.groupname
			  where groupid = @jedi
			  and not exists(select 1 
from grouper tgt 
where tgt.id = so.id and tgt.name = so.name 
and so.nationality = tgt.nationality and tgt.groupid <> @jedi)

			set @jedi = @jedi + 1

			CONTINUE  
		end
END 

select * from grouper;
     

drop table StudyOverSeas
drop table groups
drop table grouper

harishgg1 · December 20, 2019, 5:10pm

hi Wim

I tried and i tried and i tried and i tried and i tried and i tried
not able to come up non recursive solution

if somebody gave me a million dollars
I could keep trying till i find a solution

so could any one else !!!

i give up !!!

Wim_Leys · December 20, 2019, 5:12pm

I think we have spend enough time on this. Let's move on.

jeffw8713 · December 20, 2019, 6:18pm

Okay - deleted the original post because I found an issue, here is an updated version:

Declare @StudyOverSeas Table (StudentId int, StudentName varchar(20), Nationality varchar(20));

 Insert Into @StudyOverSeas (StudentId, StudentName, Nationality)
 Values ('1234567', 'steve', 'Australia')
      , ('1234568', 'stuart', 'Japan')
      , ('1234569', 'alix', 'Australia')
      , ('1234561', 'heather', 'China')
      , ('1234562', 'john', 'Sweden')
      , ('1234563', 'julie', 'Africa')
      , ('1234564', 'tracy', 'Africa')
      , ('1112212', 'dave', 'United States')
      , ('1113312', 'david', 'United States')
      , ('2221121', 'jack', 'England')
      , ('3334333', 'jason', 'Germany')
      , ('2221122', 'jackson', 'England')
      , ('4443433', 'mike', 'Spain')
      , ('4423242', 'scott', 'Italy');

Declare @totalStudents int = (Select count(*) From @StudyOverSeas sos);

   With studentGroups
     As (
 Select *
      , GroupID = ntile(@totalStudents / 3) over(Order By sos.StudentId, sos.Nationality)
   From @StudyOverSeas          sos
        )
      , studentRanks
     As (
 Select *
      , RankID = dense_rank() over(Partition By GroupID Order By Nationality)
   From studentGroups
        )
      , uniqueGroups
     As (
 Select *
      , rn = row_number() over(Partition By GroupID, Nationality Order By StudentId)
   From studentRanks
        )
 Select *
   From uniqueGroups
  Where rn = 1;

Needs further testing and validation - this will cut short a group if 2 of the students placed in that group have the same nationality and only 3 students were identified in the group.

In this method - the StudentId will cause students with id's that are close together to be grouped together...and if they have the same nationality then one (or more) of them would be removed. To get those to be moved to another group we would need some other value that will group differently....

ScottPletcher · December 20, 2019, 7:10pm

You have the same nationality in the same group. The core rule is that each member of the group must be a different nationality.

harishgg1 · December 21, 2019, 2:12pm

hi Jeff

Many thanks for your Trying

Two things have to happen !!!!

It must be divided into groups of 3
In each group all nationalities must be different

I did not see it in your SQL Output
Unless I am missing something

harishgg1 · December 21, 2019, 4:15pm

i used a different data set ..
that could be the reason WHY !!!!
..... My different data set

drop table StudyOverSeas
go 

create table StudyOverSeas
(
Studentid varchar(100),
name varchar(100),
nationality varchar(100)
)
go 


INSERT INTO StudyOverSeas (Studentid, name, nationality)
VALUES 
('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa'),
('1234565', 'Pam', 'Australia'),
('1234566', 'Sam', 'Sweden'),
('1234567', 'Ukuele', 'UK'),
('1234568', 'Ursula', 'USA'),
('1234569', 'Max', 'Mexico'),
('1234570', 'Bill', 'Brasil'),
('1234572', 'Sil', 'Brasil'),
('1234575', 'Kill', 'Brasil'),
('1234577', 'Chill', 'Brasil'),
('1234579', 'Lill', 'Brasil')
go

femiolan · December 22, 2019, 2:14pm

Hello - I think this meets your requirements..

if object_id('tempDb..#StudyOverSeas' , N'U') is not null drop table #StudyOverSeas;

create table #StudyOverSeas
(
id int primary key,
name varchar(100),
nationality varchar(20)

)


INSERT INTO #StudyOverSeas (id, name, nationality)
VALUES ('1234567', 'steve', 'Australia'),
('1234568', 'stuart', 'Japan'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'China'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa');

Declare @cnt int  = (select count(*) from #StudyOverSeas);

;with cte as
(
select * 
,ROW_NUMBER() over(partition by a.nationality order by newId()) idx
,ROW_NUMBER() over(order by newId()) grp

from #StudyOverSeas a
)

,cte_2 
as
(
select *

from cte a
outer apply
(
select distinct top (3)  b.nationality
from cte b
where a.nationality <> b.nationality

) b( grp_nat)
outer apply
(
select  (abs(checksum(newId())) % count(*)) + 1 nat_idx
from #StudyOverSeas c
where c.nationality = b.grp_nat

) c
)

select top (@cnt)  b.id, b.name , b.nationality, a.grp
from cte_2 a
left join cte b on a.grp_nat = b.nationality and b.idx = a.nat_idx
order by a.grp;

Please let me know if this helps

harishgg1 · December 22, 2019, 2:34pm

hi Wim

Since I have a lot of free time !!!! I am still on it ...

I have added a little variety to your data set
and created my own data set

Here's how far I got ... with a new approach Please share your opinion and thoughts
A little filtering magic from here should do the job

please click arrow to the left for DROP CREATE data ..

drop table StudyOverSeas
go 

create table StudyOverSeas
(
Studentid varchar(100),
name varchar(100),
nationality varchar(100)
)
go 


INSERT INTO StudyOverSeas (Studentid, name, nationality)
VALUES 
('1234567', 'steve', 'Africa'),
('1234568', 'stuart', 'Africa'),
('1234569', 'alix', 'Australia'),
('1234561', 'heather', 'Africa'),
('1234562', 'john', 'Sweden'),
('1234563', 'julie', 'Africa'),
('1234564', 'tracy', 'Africa'),
('1234565', 'Pam', 'Australia'),
('1234566', 'Sam', 'Sweden'),
('1234567', 'Ukuele', 'UK'),
('1234568', 'Ursula', 'USA'),
('1234569', 'Max', 'Mexico'),
('1234570', 'Bill', 'Brasil'),
('1234572', 'Sil', 'Brasil'),
('1234575', 'Kill', 'Brasil'),
('1234577', 'Chill', 'Brasil'),
('1234579', 'Lill', 'Brasil')
go

please click arrow to the left for SQL ..

; with cte as 
(
select 
    ROW_NUMBER() over(order by (select null)) as rn,
	a.Studentid		as aStudentid ,
	a.name			as aname,
	a.nationality	as anationality,
	b.Studentid		as bStudentid ,
	b.name			as bname,
	b.nationality	as bnationality,
	c.Studentid		as cStudentid ,
	c.name			as cname,
	c.nationality	as cnationality
 from 
      StudyOverSeas a 
		join StudyOverSeas b 
		     on a.nationality <> b.nationality
		join StudyOverSeas c 
		     on  a.nationality <> c.nationality
		     and b.nationality <> c.nationality
)
select rn, aStudentid , aname, anationality   from cte 	
union all 
select rn, bStudentid , bname, bnationality   from cte 	
union all 
select rn, cStudentid , cname, cnationality   from cte 	
order by rn

harishgg1 · December 22, 2019, 2:38pm

hi femiolan

Your solution is doing it ...

Amazing stuff ... i have been breaking my head for hours and hours hours !!!
Will see how you did it and memorize it ...
Many thanks

femiolan · December 22, 2019, 2:40pm

Thanks harishgg1! Appreciate the feedback. I agree that this one was pretty complex, I hope it helps the OP.

Thanks again

jeffw8713 · December 22, 2019, 3:25pm

My solution does separate into groups of 3 with different nationalities. The issue with this approach is that it will also group students into the same group if their student ID is within the same range...

When that occurs - we drop one of the duplicates which means some groups may have less than 3 members. To account for that we need a different mechanism for distributing the students...

Declare @totalStudents int = (Select count(*) From @StudyOverSeas sos) / 3 - 1;

   With studentGroups
     As (
 Select *
      , GroupID = ntile(@totalStudents) over(Order By newid(), sos.Nationality)
   From @StudyOverSeas          sos
        )
      , studentRanks
     As (
 Select *
      , RankID = dense_rank() over(Partition By GroupID Order By Nationality)
   From studentGroups
        )
      , uniqueGroups
     As (
 Select *
      , rn = row_number() over(Partition By GroupID, Nationality Order By StudentId)
   From studentRanks
        )
 Select *
   From uniqueGroups
  --Where rn = 1
  --  And RankID <= 3;

A small change in this - using newid() instead of StudendId in the ntile function will randomly distribute the students across each group...however, even this will sometimes place a student with the same nationality into the same group.

How you approach that depends on the source set of data and how many groupings you need. At some point you won't have enough data that can be evenly distributed - for example, if the set of data has 10 students from Australia - and another 10 students split across five other nationalities - there is no way to evenly distribute across groups of 3 without duplicates...

With the above change we can reduce the number of groups to 1 less than the total available...this will allow 'extra' rows to be assigned to each group...we can then use the row number and the RankID to remove the 'duplicate/extra' rows from the results.

If you want the 4 groups with the possibility that one or more groups will have less than 3 members - change the @totalStudents to:

Declare @totalStudents int = (Select count(*) From @StudyOverSeas sos) / 3;

Again - depending on the set of data you may end up with groups that have less than the total number of members requested. Even in a cursor approach you will end up with extra rows that cannot be placed into any grouping - so you either end up with groups with less than the total or you eliminate those rows from the results.

jeffw8713 · December 22, 2019, 3:28pm

I am not sure this meets the requirements...although not explicitly stated by the OP I assumed the requirement was that a student/member could only be listed one time and this result will place members into multiple groups.

harishgg1 · December 22, 2019, 3:59pm

TOTALLY agree with you Jeff

If you don't have the count of rows exactly a mutiple of 3
then what do you do with the extra rows !!

All depends on what is the requirement !!!
N number of things can be done

Many thanks for your efforts !!!

jeffw8713 · December 22, 2019, 4:10pm

We can simplify the query to this:

Declare @totalStudents int = (Select count(*) From @StudyOverSeas sos) / 3; --- 1;

   With studentGroups
     As (
 Select *
      , GroupID = ntile(@totalStudents) over(Order By newid(), sos.Nationality)
   From @StudyOverSeas          sos
        )
      , studentRanks
     As (
 Select *
      , RankID = dense_rank() over(Partition By GroupID Order By Nationality)
      , rn = row_number() over(Partition By GroupID, Nationality Order By StudentId)
   From studentGroups
        )
 Select *
   From studentRanks             sr
  --Where rn = 1
  --  And RankID <= 3;

Wim_Leys · December 23, 2019, 10:41am

I ran the code, using Harish's test data (post #31) and Jeff's query (post #37) , including the WHERE clause (otherwise groups of more than 3 people are formed).

  Where rn = 1
    And RankID <= 3;

There are 17 students, only 14 end up in the resultset.
The query should not discriminate people based on their nationality

I just ran the query, now going to have a better look at how it works.
And I learned something new: NTILE().

Harlington most likely never imagined his question would result in such a lengthy and vivid discussion.