ROR - create an alphanumeric string for the database identifier

In our database, each Person has an identifier that is a generated database that automatically increases by an integer. Now we want to create a more user-friendly alphanumeric identifier that can be published. Something like a passport number. Obviously, we do not want to disclose the database identifier to users. For this question I will call what we need to generate, UID .

Note UID is not intended to replace the database identifier. You can imagine the UID as a more beautiful version of the database identifier that we can provide to users.

  • I was wondering if this UID could be a database identifier function. That is, we must be able to regenerate the same UID for a given database identifier.
  • Obviously, in addition to the database identifier, the function will accept a salt or key.
  • UID does not have to be sequential. That is, two adjacent database identifiers must create visually different UIDs.
  • The UID is not required to be irreversible. That is, it is normal if someone studies the UID for several days and is able to reconstruct and find the database identifier. I don’t think it will harm us.
  • UID should contain only AZ (only in uppercase) and 0-9. Nothing more. And it should not contain characters that can be confused with other alphabets or numbers, such as 0 and O, l and 1, and so on. I guess the Crockford Base32 encoding will take care of this.
  • The UID must have a fixed length (10 characters), regardless of the size of the database identifier. We could use a UID with some constant string to bring it to the required fixed length. The database identifier can grow to any size. Thus, the algorithm should not have such input restrictions.

I think the way for this is:

Step 1: Hashing

I read about the following hash functions:

The hash returns a long string. I read here about something called XOR folding to bring the string to a shorter length. But I could not find much information about this.

Step 2. Coding.

I read about the following coding methods:

  • Crockford Base 32 Encoding
  • Z-base32
  • base36

I assume that the encoding output will be the UID string I'm looking for.

Step 3. Work around collisions.

  • To get around the collisions, I was wondering if I can generate a random key during UID generation and use this random key in a function.
  • I can store this random key in a column so that we know which key was used to create this particular UID.
  • Before inserting the newly generated UID into the table, I would check for uniqueness, and if the check failed, I can create a new random key and use it to create a new UID. This step can be repeated until a unique UID is found for a specific database identifier.

I would like to get some expert advice on whether I am following the right lines and how I am going, actually implementing this.

I am going to implement this in a Ruby On Rails application. Therefore, please consider this in your suggestions.

Thanks.

Update

The comments and response made me rethink and ask one of the requirements that I had: the need for us to be able to restore the UID for the user after he was assigned once. I guess I was just trying to be safe in the event that we lose the user UID and we can return it if it is a function of an existing user property. But we can get around this problem by simply using backups, I think.

So, if I remove this requirement, the UID will then essentially become a completely random 10-character alphanumeric string. I am adding an answer containing my proposed implementation plan. If someone else comes with a better plan, I will mark this as an answer.

+4
source share
2 answers

As I mentioned in the question update, I think we are going to do the following:

  • Pre-create a sufficiently large number of random and unique alphanumeric strings with ten characters. No hashing or encoding.
  • Store them in a table in random order.
  • When creating a user, select the first lines and assign them to the user.
  • Remove this selected identifier from the identifier pool after it is assigned to the user.
  • When the pool is reduced to a small number, replenish the pool with new lines, obviously with uniqueness checks. This can be done in the task with a delay initiated by the observer.
  • The reason for pre-generation is that we upload all the expensive uniqueness checks into a one-time pre-generation operation.
  • When choosing an identifier from this pool, a new user is guaranteed uniqueness. Thus, the user creation operation (which is very common) becomes fast.
+2
source

Will db_id.chr work for you? It will take integers and generate a string of characters from them. Then you can add your initials or last name or something else. Example:

 user = {:id => 123456, :f_name => "Scott", :l_name => "Shea"} (user.id.to_s.split(//).map {|x| (x.to_i + 64).chr}).join.downcase + user.l_name.downcase #result = "abcdefshea" 
0
source

Source: https://habr.com/ru/post/1400205/


All Articles