Algorithms: Random Unique String

I need to create a string that meets the following requirements:

  • it must be a unique string;
  • line length should be 8 characters;
  • It must contain 2 digits;
  • All characters (non-digital characters) must be uppercase.

I will store them in the database after generation (they will be assigned to other objects).

I intend to do something like this:

  • Generate 2 random values ​​from 0 to 9, they will be used for numbers in a line;
  • generate 6 random values ​​from 0 to 25 and add them up to 64 characters, they will be used as 6 characters;
  • combine everything in one line;
  • check if a row exists in the database; if not repeat.

My concern with this algorithm is that it does not guarantee a finite time result (if the database already has many values).

Question: Could you give some advice on how to improve this algorithm in order to be more deterministic?

Thanks.

+4
source share
7 answers
  • it must be a unique string;
  • line length should be 8 characters;
  • It must contain 2 digits;
  • All characters (non-digital characters) must be in upper case.

Assuming that:

  • Requirements No. 2 and No. 3 are accurate (exactly 8 characters, exactly 2 digits), and not the minimum
  • “characters” in requirement 4 are 26 capital letters A to Z
  • you need a uniformly distributed random string

Then your proposed method has two problems. First, the letters A - Z are ASCII 65–90, not 64–89. Another is that it does not evenly distribute numbers within the possible string space. This can be fixed by following these steps:

  • Create two different integers from 0 to 7 and sort them.
  • Generate 2 random numbers from 0 to 9.
  • Create 6 random letters from A to Z.
  • Use two different integers in step # 1 as positions and put 2 numbers in these positions.
  • Put 6 random letters in the remaining positions.

There are 28 possibilities for two different integers ((8 * 8 - 8 duplicates) / 2 orders), 26 6 possibilities for letters and 100 possibilities for numbers, the total number of # possible combinations: N comb = 864964172800 = 8.64 x 10 11 .


edit:. If you want to avoid the database for storage, but at the same time guarantee both the uniqueness of the lines and their cryptographic protection, the best choice is a cryptographically random bijection from the counter between 0 and N max <= N comb to a subset of the space of possible output lines. ( Bijection , which means that there is a one-to-one correspondence between the output line and the input counter.)

This is possible with Feistel networks , which are commonly used in hash functions and symmetric cryptography (including AES). You will probably want to select N max = 2 39 which is the greatest power of 2 <= N comb , and use the 39-bit Feistel network using the private key that you keep secret. Then you connect your counter to the Feistel network, and the output shows another 39-bit number X, which is then converted to the corresponding line as follows:

  • Repeat the next step 6 times:
  • Take X mod 26, generate an uppercase letter and set X = X / 26.
  • Take X mod 100 to generate two digits and set X = X / 100.
  • X will now be between 0 and 17 inclusive (2 39/26 6/100 = 17.796 ...). Match this number with two unique digit positions (it might be easiest to use a lookup table since we are only talking about 28 possibilities. If you had more, use the Floyd algorithm to create a unique permutation and use the base variable mod + integer divide method instead of generating a random number).
  • Follow the random approach described above, but use the numbers generated by this algorithm instead.

Alternatively, use 40-bit numbers, and if your Feistel network output is> N comb , then increase the counter and try again. This covers the entire string space due to the rejection of invalid numbers and the need to re-execute the algorithm. (But you do not need a database for this.)

But this is not something to enter if you do not know what you are doing.

+6
source

Are these user passwords? If so, you need to consider a few things:

  • You should avoid 0 / O and I / 1, which can easily be mistaken for each other.
  • You must avoid too many consecutive letters that may contain a gross word.

As for 2, you can avoid the problem by using LLNLLNLL as your template (L = letter, N = number).

If you need 1 million passwords from a pool of 2.5 billion dollars, you will certainly encounter conflicts in your database, so you will have to deal with them gracefully. But a simple repeat is enough if the random number generator is reliable.

+1
source

I do not see anything in your requirements, which say that the string should be random. You can simply do something like the following pseudocode:

for letters in ( 'AAAAAA' .. 'ZZZZZZ' ) { for numbers in ( 00 .. 99 ) { string = letters + numbers } } 

This will create unique eight-character strings with two numbers and six uppercase letters.

If you need randomly generated lines, then you need to save some record about which lines were previously generated, so you have to hit the database (or save them all in memory or write them to a text file) and check this list.

0
source

I think that you are well within your tens of thousands of such identifiers, and even after that you are most likely in order.

Now, if you need some kind of determinism, you can always force the password after a certain number of failures. Say, after 50 failures, you randomly select a password and increase its part by 1 until you get a free one.

I'm willing to bet for money, although you will never see too much functionality throughout your life :)

0
source

Firstly, your requirements list does not indicate that the row should be random, so you might consider something like a database index.

If "random" is a requirement, you can make several improvements.

  • Save the string as a number in the database. Not sure how much this improves performance.
  • Do not store used strings at all. You can use the “index” approach above, but convert the integer to a string in a seemingly random way (like using a bit shift). Without special research, no one will see the pattern.

For example, if we have a sequence of 1, 2, 3, 4, ... and use a cyclic binary right shift of 1 bit, it will be turned into 4, 1, 5, 2, ... (assuming we have only 3 bits) It also should not be a shift, it can be a permutation or any other "randomization".

0
source

Do it the other way around: create one large random number that you split to get individual characters:

  long bigrandom = ...; int firstDigit = bigRandom % 10; int secondDigit = ( bigrandom / 10 ) % 10; 

etc.

Then you save only a random number in your database, not a string. Since there is a one-to-one relationship between the line and the number, this does not really matter.

However, when you try to insert a new value and it is already in the database, you can easily find the smallest unallocated quantity metric, different from the originally generated number, and use it instead of the one you created.

What you get from this method is that you are guaranteed to find available code relatively quickly, even if most of the codes are already allocated.

0
source

The problem with your approach is that although you have few records, you are unlikely to encounter collisions, but as the number of records increases, the probability will increase until it becomes more likely than you will encounter. In the end, you will encounter several collisions before you get a “valid” result. Each time, a table scan will be required to determine if the code is valid, and all this becomes a mess.

The simplest solution is to pre-compute your codes .

Start with the first code 00AAAA and increase the value to create 00AAAB, 00AAAC ... 99ZZZZ. Insert them into the table in random order. When you need a new code, extract the unused entry from the table to the top entry (then mark it as used). This is not a huge table, as indicated above - just a few million records.

  • You do not need to calculate random numbers and generate strings for each user (already done)
  • You don’t need to check if something has already been used, just get the next available
  • It is not possible to get multiple collisions before finding something suitable for use.

If you ever need more “codes,” just generate a few “random” rows and add them to the table.

0
source

Source: https://habr.com/ru/post/1335894/


All Articles