SQLTeam.com | Weblogs | Forums

Random numbers are not random in t-sql with large volume

This has been far more complicated than I thought. I have tried every known way I could find online to create a random value.
(CONVERT(INT, CRYPT_GEN_RANDOM(3)) % 1000000),
(CONVERT(INT, RAND()*1000000) % 1000000),
(ABS(CHECKSUM(NEWID())) % 1000000);

etc, and all these ways look to be random... But they are not. An example of this is if I use a sample set of 100,000 plus rows, and have a number between 1 and 1,000,000,000 (Yes a billion) I will get multiple collisions on that 100k rows... At first I thought this might be the "BIRTHDAY PROBLEM" example, but with a 1 billion sample pool, this should not have collision if it is trully picking 1 to 1 billion, correct?

Here is my sample test code that will load 100,000 records

/*
DROP TABLE NumbersTest
CREATE TABLE NumbersTest (Number  int  not null)  
;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<32767
)
    insert into NumbersTest(Number)
        select Number from Nums option(maxrecursion 32767)
    --ALTER TABLE NumbersTest ADD CONSTRAINT PK_NumbersTest PRIMARY KEY CLUSTERED (Number)



;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 32767)

	;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 32767)

	;WITH Nums(Number) AS
(SELECT 1 AS Number
 UNION ALL
 SELECT Number+1 FROM Nums where Number<32767
)
insert into NumbersTest(Number)
    select Number from Nums option(maxrecursion 32767)

	--preload numbers table
*/

select * from
(
select ROW_NUMBER() over (partition by myNum order by myNum) as rowId,*
from
(

SELECT Cast(RAND(convert(varbinary, newid()))*(1000000000-1)+ 1 as bigint) as myNum
--select * 
from NumbersTest 
) a
) aa

where aa.rowid = 2

It appears it is "partially" the birthday paradox... I'd expect zero matches 99.999% of the time, but reality is I should get a match about 99% Mind blown;)

if you have a list of 100000 unique random numbers and you are randomly picking #'s 1 to 1 billion your odds of two collisions are 99%, how... well it hurts my brain...

https://betterexplained.com/articles/understanding-the-birthday-paradox/

Try

select * from
(
select ROW_NUMBER() over (partition by myNum order by myNum) as rowId,*
from
(

SELECT cast(CRYPT_GEN_RANDOM(6) as bigint) as myNum 
--select * 
from NumbersTest 
) a
) aa

where aa.rowid = 2

Thanks, but i posted the reason. It is the same results. The issue is in the probability, not the function