Why is the uniform restriction of Single_int_distribution uniform, but uniform_real_distribution is a half-open range?

uniform_int_distribution has the interval [a, b] , but uniform_real_distribution has the interval [a, b) . The naive approach is to do something like b + 0.1 , but then you start to penetrate into the infinitesimals ... fortunately, the correct approach is simple:

 std::uniform_real_distribution<> dis(start, std::nextafter(stop, DBL_MAX)); 

But why is this needed? In particular, what is the rationale for the two being different?

+5
source share
2 answers

The uniform real distribution over [a, b) almost statistically indistinguishable from the distribution [a, b] .

The statistical distance between the two values ​​is almost divided by the number of floating point numbers between a and b .

That is, there is no statistical test that returns either 0 or 1 for any given sample, so the probability of observing a 1 with the first distribution differs from the probability of observing a 1 with the second distribution by more than 2^{-32} . (Assuming you're driving, for example, std::uniform_real_distribution<float> , using pure entropy from std::random_device .)

Thus, in most real applications there is no significant difference.

If you want to do something like, use the range (a, b] to prevent (extremely unlikely) dividing by zero error or something else, you can check if you have exactly a and replace it with b if it some some critical system.


In addition, FWIW I suspect that your β€œnext after b” solution will not always do what you think - most of the time what happens, the adapter is going to calculate b - a as a floating point number, then take a sample from the source of entropy, convert it to a floating point number in the range 0-1 with a static throw, then multiply by the coefficient ba , add a and return the result.

Changing epsilon to b may be lost when floating-point subtraction occurs if a and b do not have exactly the same scale, and then the change will have no effect.

The reason this is not a mistake is because the standard does not require the distribution adapter to produce the exact distribution specified, only that it converges to the target in an approximate sense.

The standard makes a firm guarantee that the distribution will never produce anything outside its stated range:

26.5.8.1 General [rand.dist.general]
1. Each type created from the class template specified in this section 26.5.8 satisfies the random number distribution requirements (26.5.1.6).
...
3. The algorithms for creating each of these distributions are determined by the implementation.
4. The value of each probability density function p(z) and each discrete probability function P (zi ) indicated in this section 0 everywhere outside its declared domain.

But the standard also talks about homogeneous random number generators:

26.5.3.1 [rand.req.urng]
1. A single random number generator g type g is a function object that returns unsigned integers, such that each value in the range of possible results has (ideally) an equal probability of return. [Note the extent to which g s results are close to ideal, often determined statistically. - final note]

Since [a, b) and [a, b + epsilon) are mostly statistically indistinguishable, you should not consider it a mistake, even if you still do not see your trick b + epsilon at output b , even if you carefully try all possible seeds.

If this were not the point of view of the standard, then all these adapters had to be carefully written so that they could always get the correct distribution for each architecture, and they would have to work much slower. In most applications, minimal sampling errors like this are acceptable, and it is more important to be efficient.

+6
source

Let me offer one justification: when using integers, you sometimes use them to make choices. So you have N values, each of which should be equally likely. Sometimes the highest possible value may even be the highest value represented in this integer type, so N + 1 will be out of range and overflow.

For floating point values, you often cannot represent range limits (e.g. 0.1 ), and since numbers usually don't compare for equality (e.g. 1.0/10.0 may or may not compare with 0.1 >), you ignore the fact that the limit value does not occur.

+2
source

Source: https://habr.com/ru/post/1245035/


All Articles