Thursday, January 30, 2020

Generate unique random integers without collisions in sql server


Here is the solution to generate unique number  in sql server. We may use one of these calculations to generate a number in this set:

SELECT
  1000000 + (CONVERT(INT, CRYPT_GEN_RANDOM(3)) % 1000000),
  1000000 + (CONVERT(INT, RAND()*1000000) % 1000000),
  1000000 + (ABS(CHECKSUM(NEWID())) % 1000000);


But think again, Is number unique ?

Generating a random number on its own is not difficult, using methods like RAND() or CHECKSUM(NEWID()). The problem comes when you have to detect collisions. Let's take a quick look at a typical approach, assuming we want CustomerID values between 1 and 1,000,000: 

DECLARE @rc INT = 0,
  @CustomerID INT = ABS(CHECKSUM(NEWID())) % 1000000 + 1;
              -- or ABS(CONVERT(INT,CRYPT_GEN_RANDOM(3))) % 1000000 + 1;
              -- or CONVERT(INT, RAND() * 1000000) + 1;
WHILE @rc = 0
BEGIN
  IF NOT EXISTS (SELECT 1 FROM dbo.Customers WHERE CustomerID = @CustomerID)
  BEGIN
    INSERT dbo.Customers(CustomerID) SELECT @CustomerID;
    SET @rc = 1;
  END
  ELSE
  BEGIN
    SELECT @CustomerID = ABS(CHECKSUM(NEWID())) % 1000000 + 1,
      @rc = 0;
  END
END


Wait, As the table gets larger, not only does checking for duplicates get more expensive (More time required to complete the process)

Different Approach

One idea is to pre-calculate a large number of random number and store it in different table 

CREATE TABLE dbo.RandomIDs
(
  RowNumber INT PRIMARY KEY CLUSTERED,
  NextID INT 
) WITH (DATA_COMPRESSION = PAGE);
-- data compression used to minimize impact to disk and memory

-- if not on Enterprise or CPU is your bottleneck, don't use it
;WITH x AS
(
  SELECT TOP (1000000) rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
    FROM sys.all_objects AS s1
    CROSS JOIN sys.all_objects AS s2
    ORDER BY s1.[object_id]
)
INSERT dbo.RandomIDs(RowNumber, NextID)
  SELECT rn, ROW_NUMBER() OVER (ORDER BY NEWID()) + 1000000
  FROM x;

Now, in order to generate the next ID, we can simply delete the lowest RowNumber available, and output its NextID for use. We'll use a CTE to determine the TOP (1) row so that we don't rely on "natural" order 

DECLARE @t TABLE(NextID INT);
;WITH NextIDGenerator AS
(
  SELECT TOP (1) NextID FROM dbo.RandomIDs ORDER BY RowNumber
)
DELETE NextIDGenerator OUTPUT deleted.NextID INTO @t;
INSERT dbo.Users(UserID /* , other columns */)
  SELECT NextID /* , other parameters */ FROM @t;
















1 comment:

  1. I would suggest to go with newsequentialid() defined as the column dataype. It is the fastest way to generate a unique id. NEWID is based on the current DB timestamp so it does not check for duplicacy, but wouldn't be as fast as newsequenctialid.
    If purpose is to generate random number then we might stick with function made purposefully to generate random rather going with checksum or other.

    ReplyDelete

Kafka setup in window

Here's a step-by-step guide on how to do it : 1. Prerequisites : Before you begin, make sure you have the following prerequisites inst...