How to generate a number in an arbitrary range using random () = {0..1}, while maintaining uniformity and density?

Create a random number in the range [x..y], where x and y are arbitrary floating point numbers. Use the random () function, which returns a random floating-point number in the range [0..1] from evenly distributed numbers P (call it density). Uniform distribution must be maintained and P must also be scaled.

I think there is no simple solution to such a problem. To simplify it a bit, I ask how to generate a number in the interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], keeping uniformity and density? Thus, for [0, 2], it must generate a random number from the uniformly distributed numbers P * 2.

The obvious simple solution random() * (x - y) + y will not generate all possible numbers due to the lower density for all cases abs(xy)>1.0 . Many possible values ​​will be skipped. Remember that random () returns only the number of possible numbers P. Then, if you multiply such a number by Q, it will give you only one of the possible values ​​of P, scaled by Q, but you should also scale the density of P by Q.

+4
source share
9 answers

If you really want to generate all possible floating point numbers in a given range with a uniform numerical density, you need to consider the floating point format. For each possible value of your binary exponent, you have a different numerical code density. The direct generation method will need to deal with this explicitly, and the indirect generation method should still take it into account. I will develop a direct method; for simplicity, the following applies exclusively to IEEE 754 single-precision (32-bit) floating point numbers.

The most difficult case is any interval including zero. In this case, to get an evenly even distribution, you will need to process each indicator to the lowest as well as denormalized numbers. As a special case, you need to divide zero into two cases: +0 and -0.

In addition, if you pay so much attention to the result, you need to make sure that you are using a good pseudo-random number generator with a sufficiently large state space that you can expect it to reach each value, an even probability. This will disqualify the functions of the C / Unix rand() library and possibly *rand48() ; you should use the Mersenne Twister instead.


The key is to scatter the target interval into sub-intervals, each of which is covered by a different combination of binary exponent and sign: in each sub-interval, floating-point codes are evenly distributed.

The first step is to select the appropriate subset, with a probability proportional to its size. If the interval contains 0 or otherwise covers a large dynamic range, this may potentially require a series of random bits up to the full range of the available metric.

In particular, there are 256 possible exponent values ​​for the 32-bit IEEE-754 number. Each exhibitor controls a range that is half the size of the next larger metric, with the exception of the denormalized case, which is the same size as the smallest normal area of ​​the exponent. Zero can be considered the smallest denormalized number; as stated above, if the target interval drops to zero, the probability of each of +0 and -0 may need to be cut in half to avoid doubling its weight.

If the selected selected interval covers the entire area determined by a specific indicator, all that is needed is filling the mantissa with random bits (23 bits for 32-bit IEEE-754 floats). However, if the sub-interval does not cover the entire area, you need to create a random mantissa that covers only this interval.

The easiest way to handle both the initial and secondary random steps can be to round the target interval, including the entire part of the exponent's areas partially covered, then reject and repeat the numbers that go beyond it. This allows you to generate an exponent with a simple degree of probability (for example, by counting the number of leading zeros in your random bitstream), and also provides a simple and accurate way to generate a mantissa that covers only part of the exponent interval. (This is also a good way to handle a special case +/- 0.)

As another special case: in order to avoid inefficient generation for target intervals that are much smaller than the exponential regions in which they are located, the “obvious simple” solution will actually generate fairly uniform numbers for such intervals. If you want to precisely distribute evenly, you can generate a subsegment mantissa using only enough random bits to cover this intermediate interval, while still using the aforementioned deviation method to exclude values ​​outside the target interval.

+2
source

If I understand your problem well, I will give you a solution: but I would exclude 1 from the range.

 N = numbers_in_your_random // [0, 0.2, 0.4, 0.6, 0.8] will be 5 // This turns your random number generator to return integer values between [0..N[; function randomInt() { return random()*N; } // This turns the integer random number generator to return arbitrary // integer function getRandomInt(maxValue) { if (maxValue < N) { return randomInt() % maxValue; } else { baseValue = randomInt(); bRate = maxValue DIV N; bMod = maxValue % N; if (baseValue < bMod) { bRate++; } return N*getRandomInt(bRate) + baseValue; } } // This will return random number in range [lower, upper[ with the same density as random() function extendedRandom(lower, upper) { diff = upper - lower; ndiff = diff * N; baseValue = getRandomInt(ndiff); baseValue/=N; return lower + baseValue; } 
+3
source

ok, [0..1] * 2 == [0..2] (still uniform)

[0..1] - 0.5 == [-0.5..0.5] , etc.

I wonder where you experienced such an interview?

Update: well, if we want to start taking care to lose accuracy when multiplying (which is strange because for some reason you didn’t like this in the original problem and pretending to care about the “quantity of values”, we can start the iteration To do this, we need another function that returns uniformly distributed random values ​​in [0..1) , which can be done by resetting the value 1.0 . so that we can cut the entire range in equal parts small enough not to care about the loss of accuracy, select one random case (we have enough randomness for this) and select a number in this bucket using the function [0..1) for all but last.

Or you can come up with a way to encode enough values ​​to take care of them, and just generate random bits for this code, in which case you don't care if it is [0..1] or just {0, 1}.

+1
source

Let me reformulate your question:

Let random() be a random number generator with a discrete uniform distribution over [0,1) . Let D be the number of possible values ​​returned by random() , each of which is exactly 1/D greater than the previous one. Create a random number generator rand(L, U) with a discrete uniform distribution over [L, U) so that every possible value is exactly 1/D greater than the previous one.

-

A few quick notes.

  • The problem is in this form, and, as you put it, it is insoluble. What if N = 1, we can do nothing.
  • I do not require 0.0 be one of the possible values ​​for random() . If this is not so, then it is possible that the solution below will not be satisfied if U - L < 1 / D It doesn't bother me much.
  • I use all half-open ranges because it simplifies the analysis. Using your closed ranges would be simple but tedious.

Finally, good stuff. The key understanding here is that density can be maintained by independently selecting the entire and partial part of the result.

First, note that given random() trivial to create randomBit() . I.e

 randomBit() { return random() >= 0.5; } 

Then, if we want to select one of {0, 1, 2, ..., 2^N - 1} uniformly randomly, just using randomBit() , just generate every bit. Call it random2(N) .

Using random2() , we can choose one of {0, 1, 2, ..., N - 1} :

 randomInt(N) { while ((val = random2(ceil(log2(N)))) >= N); return val; } 

Now, if D known, then the problem is trivial, since we can reduce it to a simple choice of one of the floor((U - L) * D) values ​​uniformly randomly, and we can do this with randomInt() .

So, suppose that D unknown. Now let me first create a function to generate random values ​​in the range [0, 2^N) with the proper density. It's simple.

 rand2D(N) { return random2(N) + random(); } 

rand2D() , where we require that the difference between consecutive possible values ​​for random() be exactly 1/D If not, the possible values ​​here will not have uniform density.

Next, we need a function that selects a value in the range [0, V) with the proper density. This is similar to randomInt() above.

 randD(V) { while ((val = rand2D(ceil(log2(V)))) >= V); return val; } 

And finally ...

 rand(L, U) { return L + randD(U - L); } 

Now we can shift discrete positions if L / D not an integer, but this is not essential.

-

In the last post, you may have noticed that some of these functions never stop. This is essentially a requirement. For example, random() can only have one bit of randomness. If I ask you to choose one of three values, you cannot do it evenly randomly with a guarantee that the function will complete.

+1
source

Consider this approach:

I assume the base random number generator in the range [0..1] generates among the numbers

0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p-1)

If the length of the target interval is less than or equal to 1, return random()*(yx) + x .

Else, map each number r from the base RNG to the interval in the target range:

[r*(p-1)*(yx)/p, (r+1/(p-1))*(p-1)*(yx)/p]

(i.e., for each of the numbers P, one of the intervals P is assigned with the length (yx)/p )

Then recursively generate another random number in this interval and add it to the beginning of the interval.

pseudo code:

 const p; function rand(x, y) r = random() if yx <= 1 return x + r*(yx) else low = r*(p-1)*(yx)/p high = low + (yx)/p return x + low + rand(low, high) 
+1
source

In real math: a solution is only provided:

 return random() * (upper - lower) + lower 

The problem is that even if you have floating point numbers, there is only a certain resolution. So what you can do is apply the above function and add another random () value, scaled to the missing part.

If I make a practical example, it will become clear what I mean:

eg. take a random () return value from 0..1 with an accuracy of 2 digits, i.e. 0.XY and below with 100 and the top with 1100.

So, using the above algorithm, you get the result 0.XY * (1100-100) + 100 = XY0.0 + 100. You will never see the result 201, since the last digit should be 0.

The solution here would be to create a random value again and add it * 10, so that you have the accuracy of one digit (here you need to make sure that you do not exceed this range, what can happen, in this case you have to discard the result and generate a new number).

Perhaps you need to repeat it, how often it depends on how many places the random () function performs and how much you expect in the end.

In the standard format, IEEE has limited accuracy (i.e. double 53 bits). Therefore, when you generate a number this way, you never need to generate more than one additional number.

But you must be careful that when you add a new number, you do not exceed your upper limit. There are several solutions for this: firstly, if you exceed your limit, you start with a new one, generating a new number (do not cut off or are similar, as this changes the distribution).

The second possibility is to check the size of the interval for the missing lower bit range and find the average value and create an appropriate value that ensures that the result matches.

0
source

You must consider the amount of entropy that comes from each call to your RNG. Here is the C # code I just wrote that demonstrates how you can accumulate entropy from sources with low entropy and end up with a random value with high entropy.

 using System; using System.Collections.Generic; using System.Security.Cryptography; namespace SO_8019589 { class LowEntropyRandom { public readonly double EffectiveEntropyBits; public readonly int PossibleOutcomeCount; private readonly double interval; private readonly Random random = new Random(); public LowEntropyRandom(int possibleOutcomeCount) { PossibleOutcomeCount = possibleOutcomeCount; EffectiveEntropyBits = Math.Log(PossibleOutcomeCount, 2); interval = 1.0 / PossibleOutcomeCount; } public LowEntropyRandom(int possibleOutcomeCount, int seed) : this(possibleOutcomeCount) { random = new Random(seed); } public int Next() { return random.Next(PossibleOutcomeCount); } public double NextDouble() { return interval * Next(); } } class EntropyAccumulator { private List<byte> currentEntropy = new List<byte>(); public double CurrentEntropyBits { get; private set; } public void Clear() { currentEntropy.Clear(); CurrentEntropyBits = 0; } public void Add(byte[] entropy, double effectiveBits) { currentEntropy.AddRange(entropy); CurrentEntropyBits += effectiveBits; } public byte[] GetBytes(int count) { using (var hasher = new SHA512Managed()) { count = Math.Min(count, hasher.HashSize / 8); var bytes = new byte[count]; var hash = hasher.ComputeHash(currentEntropy.ToArray()); Array.Copy(hash, bytes, count); return bytes; } } public byte[] GetPackagedEntropy() { // Returns a compact byte array that represents almost all of the entropy. return GetBytes((int)(CurrentEntropyBits / 8)); } public double GetDouble() { // returns a uniformly distributed number on [0-1) return (double)BitConverter.ToUInt64(GetBytes(8), 0) / ((double)UInt64.MaxValue + 1); } public double GetInt(int maxValue) { // returns a uniformly distributed integer on [0-maxValue) return (int)(maxValue * GetDouble()); } } class Program { static void Main(string[] args) { var random = new LowEntropyRandom(2); // this only provides 1 bit of entropy per call var desiredEntropyBits = 64; // enough for a double while (true) { var adder = new EntropyAccumulator(); while (adder.CurrentEntropyBits < desiredEntropyBits) { adder.Add(BitConverter.GetBytes(random.Next()), random.EffectiveEntropyBits); } Console.WriteLine(adder.GetDouble()); Console.ReadLine(); } } } } 

Since I use a 512-bit hash function, this is the maximum amount of entropy you can get from EntropyAccumulator. This can be fixed if necessary.

0
source

When you create a random number with a random (), you get a floating point number between 0 and 1 that has an unknown precision (or density, you name it).

And when you multiply it by a number (NUM), you lose that precision, lg (NUM) (a logarithm based on 10). Therefore, if you multiply by 1000 (NUM = 1000), you lose the last 3 digits (lg (1000) = 3).

You can fix this by adding a smaller random number to the original, which does not have three digits. But you do not know the accuracy, therefore you cannot determine exactly where.

I can imagine two scenarios:

(X = beginning of range, Y = end of range)

1: you determine the accuracy (PREC, for example, 20 digits, so PREC = 20), and consider this enough to generate a random number, so the expression will be:

 ( random() * (YX) + X ) + ( random() / 10 ^ (PREC-trunc(lg(YX))) ) 

with numbers: (X = 500, Y = 1500, PREC = 20)

 ( random() * (1500-500) + 500 ) + ( random() / 10 ^ (20-trunc(lg(1000))) ) ( random() * 1000 + 500 ) + ( random() / 10 ^ (17) ) 

There are some problems with this:

  • 2-phase random generation (how much will it be random?)
  • The first random result 1 → may be out of range.

2: guess the accuracy with random numbers

you define some attempts (e.g. 4) to calculate the accuracy by generating random numbers and counting the accuracy each time:

 - 0.4663164 -> PREC=7 - 0.2581916 -> PREC=7 - 0.9147385 -> PREC=7 - 0.129141 -> PREC=6 -> 7, correcting by the average of the other tries 

This is my idea.

0
source

If I understand your problem correctly, this means that rand () generates exactly spaced, but ultimately discrete random numbers. And if we multiply it by (yx), which is large, it will scatter these small-sized floating-point values ​​in a way that skips many floating-point values ​​in the range [x, y]. Everything is good?

If so, I think we have a solution already given by Dialecticus. Let me explain why he is right.

First, we know how to create a random float, and then add another floating point value to it. This may result in a rounding error due to the addition, but it will only be in the last decimal place. Use doubling or something with a more accurate digital resolution if you need higher accuracy. Thus, with this reservation, the problem is no more complicated than finding a random float in the range [0, yx] with uniform density. Let say yx = z. Obviously, since z is a floating point, it may not be an integer. We process the problem in two stages: first we generate random digits to the left of the decimal point, and then we generate random digits to the right of it. Performing both equally means that their sum is evenly distributed over the range [0, z]. Let w be the largest integer & lt = z. To answer our simplified problem, we can first select a random integer from the range {0,1, ..., w}. Then, step # 2 is to add a random float from a unit interval to this random number. This is not multiplied by any possibly large values, therefore it has the same exact resolution as the numerical type can have. (Assuming you are using the perfect floating point generator).

So, what about the corner case where the random integer was the largest (i.e. w) and the random float that we added to it was larger than z - w so that the random number would exceed the allowable maximum? The answer is simple: repeat all this and check the new result. Repeat until you get a digit in the allowed range. This is simple proof that a uniformly generated random number that is thrown and generated again if it is out of the acceptable range results in a uniformly generated random value in the allowed range. After you make this key observation, you will see that Dialecticus fulfilled all your criteria.

0
source

Source: https://habr.com/ru/post/1379762/


All Articles