How to use random.jumpahead in Python

I have an application that performs a specific experiment 1000 times (multi-threaded, so that several experiments are performed simultaneously). Each experiment requires approximately. 50,000 calls to random.random ().

What is the best approach to get this really random. I could copy a random object to each experiment and do than jump 50.000 * expid. The documentation suggests that jumpahead (1) already scrambles the state, but is that true?

Or is there another way to do this in a β€œbetter way”?

(No, random numbers are not used for security, but for the hasting metropolis algorithm. The only requirement is that the experiments be independent, and not if the random sequence is somehow predictable or so)

+3
source share
4 answers

You should not use this feature. There is no evidence that it can run on a Mersenne Twister generator. Indeed, it was removed from Python 3 for this reason .

For more information about generating pseudo-random numbers in parallel environments, see this article by David Hill .

+3
source

I could copy a random object to each experiment and do than jump 50.000 * expid.

Approximately correct. Each thread gets its own instance of Random .

Put all of them in the same initial value. Use a constant for testing, use / dev / random when you "start recording".

Edit Outside of Python and in older implementations, use jumpahead( 50000 * expid ) to avoid the situation where two generators end up in parallel sequences of values. In any reasonably current (post 2.3) Python, jumpahead no longer linear, and using expid is sufficient to scramble the state.

You cannot just do jumpahead(1) in each thread, as this ensures that they are in sync. Use jumpahead( expid ) to ensure that each stream is distinctly scrambled.

The documentation suggests that jumpahead (1) already scrambles the state, but is that true?

Yes, jumpahead really "scrambles" the state. Recall that for a given seed you get one thing - a long but fixed sequence of pseudorandom numbers. You jump forward in this sequence. To pass random tests, you must get all of your values ​​from this sequence.

Edit Once upon a time, jumpahead (1) was limited. Now jumpahead (1) really does more scrambling. However, scrambling is deterministic. You cannot just do jumpahead(1) in each thread.

If you have several generators with different seeds, you break the assumption "one sequence from one seed", and your numbers will not be as random as if you got them from one sequence.

If you only jump 1, you can get parallel sequences that may be similar. [This similarity cannot be detected; theoretically, there are similarities.]

When you jump 50,000, you are assuring that you are following the 1st sequence of 1-seed. You will also be convinced that in two experiments there will be no adjacent sequences of numbers.

Finally, you also have repeatability. For a given seed, you get consistent results.

The same jump: not good.

 >>> y=random.Random( 1 ) >>> z=random.Random( 1 ) >>> y.jumpahead(1) >>> z.jumpahead(1) >>> [ y.random() for i in range(5) ] [0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567] >>> [ z.random() for i in range(5) ] [0.99510321786951772, 0.92436920169905545, 0.21932404923057958, 0.20867489035315723, 0.91525579001682567] 
+5
source

jumpahead(1) really sufficient (and identical to jumpahead(50000) or any other similar call in the current random implementation - I believe this happened at the same time as the Mersenne Twister implementation). So use any argument that fits well with your program logic. (Of course, use a separate instance of random.Random for the thread for thread safety purposes, as your question is already hinting).

( random units of generated numbers should not be cryptographically strong, so it’s good that you do not use it for security purposes ;-).

+3
source

To random module docs on python.org:

"You can create your own instances of Random to generate generators that do not share state.

And, as you noticed, there is also a note reminiscent of a jump. But the guarantees there are vague. If the random chance calls provided by the OS are not so expensive as to dominate your work time, I would skip all the subtleties and do something like:

 randoms = [random.Random(os.urandom(4)) for _ in range(num_expts)] 

If num_expts ~ 1000, then you are unlikely to encounter any collision in your seed (the birthday paradox says that you need about 65,000 experiments before the probability is 50% chance of a collision). If this is not enough for you, or if the number of experiments is more like 100k instead of 1k, then I think it's wise to keep track of this with

 for idx, r in enumerate(randoms): r.jumpahead(idx) 

Please note that I do not think this will work to just make your seed longer (e.g. os.urandom (8)), as random documents indicate that the seed should be hashed, and so on on 32-bit on the platform, you get only 32 bits (4 bytes) of useful entropy in your seed.

This question aroused my curiosity, so I went and looked at the code that implements the random module. I'm definitely not a PRNG expert, but it seems that slightly different n values ​​in jumpahead (n) will lead to noticeably different states of random instances. (It is always scary to contradict Alex Martelli, but the code uses the value n when shuffling a random state).

0
source

Source: https://habr.com/ru/post/1341245/


All Articles