In our database, each Person has an identifier that is a generated database that automatically increases by an integer. Now we want to create a more user-friendly alphanumeric identifier that can be published. Something like a passport number. Obviously, we do not want to disclose the database identifier to users. For this question I will call what we need to generate, UID .
Note UID is not intended to replace the database identifier. You can imagine the UID as a more beautiful version of the database identifier that we can provide to users.
- I was wondering if this UID could be a database identifier function. That is, we must be able to regenerate the same UID for a given database identifier.
- Obviously, in addition to the database identifier, the function will accept a salt or key.
- UID does not have to be sequential. That is, two adjacent database identifiers must create visually different UIDs.
- The UID is not required to be irreversible. That is, it is normal if someone studies the UID for several days and is able to reconstruct and find the database identifier. I donβt think it will harm us.
- UID should contain only AZ (only in uppercase) and 0-9. Nothing more. And it should not contain characters that can be confused with other alphabets or numbers, such as 0 and O, l and 1, and so on. I guess the Crockford Base32 encoding will take care of this.
- The UID must have a fixed length (10 characters), regardless of the size of the database identifier. We could use a UID with some constant string to bring it to the required fixed length. The database identifier can grow to any size. Thus, the algorithm should not have such input restrictions.
I think the way for this is:
Step 1: Hashing
I read about the following hash functions:
The hash returns a long string. I read here about something called XOR folding to bring the string to a shorter length. But I could not find much information about this.
Step 2. Coding.
I read about the following coding methods:
- Crockford Base 32 Encoding
- Z-base32
- base36
I assume that the encoding output will be the UID string I'm looking for.
Step 3. Work around collisions.
- To get around the collisions, I was wondering if I can generate a random key during UID generation and use this random key in a function.
- I can store this random key in a column so that we know which key was used to create this particular UID.
- Before inserting the newly generated UID into the table, I would check for uniqueness, and if the check failed, I can create a new random key and use it to create a new UID. This step can be repeated until a unique UID is found for a specific database identifier.
I would like to get some expert advice on whether I am following the right lines and how I am going, actually implementing this.
I am going to implement this in a Ruby On Rails application. Therefore, please consider this in your suggestions.
Thanks.
Update
The comments and response made me rethink and ask one of the requirements that I had: the need for us to be able to restore the UID for the user after he was assigned once. I guess I was just trying to be safe in the event that we lose the user UID and we can return it if it is a function of an existing user property. But we can get around this problem by simply using backups, I think.
So, if I remove this requirement, the UID will then essentially become a completely random 10-character alphanumeric string. I am adding an answer containing my proposed implementation plan. If someone else comes with a better plan, I will mark this as an answer.