# Random numbers are not random in t-sql with large volume

This has been far more complicated than I thought. I have tried every known way I could find online to create a random value.
(CONVERT(INT, CRYPT_GEN_RANDOM(3)) % 1000000),
(CONVERT(INT, RAND()*1000000) % 1000000),
(ABS(CHECKSUM(NEWID())) % 1000000);

etc, and all these ways look to be random... But they are not. An example of this is if I use a sample set of 100,000 plus rows, and have a number between 1 and 1,000,000,000 (Yes a billion) I will get multiple collisions on that 100k rows... At first I thought this might be the "BIRTHDAY PROBLEM" example, but with a 1 billion sample pool, this should not have collision if it is trully picking 1 to 1 billion, correct?

Here is my sample test code that will load 100,000 records

``````/*
DROP TABLE NumbersTest
CREATE TABLE NumbersTest (Number  int  not null)
;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 32767)
--ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)

;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 32767)

;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 32767)

;WITH Nums(Number) AS
(SELECT 1 AS Number
UNION ALL
SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
select Number from Nums option(maxrecursion 32767)

*/

select * from
(
select ROW_NUMBER() over (partition by myNum order by myNum) as rowId,*
from
(

SELECT Cast(RAND(convert(varbinary, newid()))*(1000000000-1)+ 1 as bigint) as myNum
--select *
from NumbersTest
) a
) aa

where aa.rowid = 2``````

It appears it is "partially" the birthday paradox... I'd expect zero matches 99.999% of the time, but reality is I should get a match about 99% Mind blown;)

if you have a list of 100000 unique random numbers and you are randomly picking #'s 1 to 1 billion your odds of two collisions are 99%, how... well it hurts my brain...

Try

``````select * from
(
select ROW_NUMBER() over (partition by myNum order by myNum) as rowId,*
from
(

SELECT cast(CRYPT_GEN_RANDOM(6) as bigint) as myNum
--select *
from NumbersTest
) a
) aa

where aa.rowid = 2``````

Thanks, but i posted the reason. It is the same results. The issue is in the probability, not the function