The algorithm of the optimal expected amount in the game with profit / loss

Question

The algorithm of the optimal expected amount in the game with profit / loss

I recently raised the following question:

"You have a box with green gold and blue coins. Choose a random coin, G gives a profit of +1 and a blue loss of -1. If you play optimally, what is the expected profit."

I thought about using brute force algorithm, where I consider all the possibilities of combinations of green and blue coins, but I am sure that there should be a better solution for this (range B and G was from 0 to 5000). Also, what does optimal reproduction mean? Does this mean that if I take all the blue coins, I will continue to play until all the green coins are selected? If so, then I should not consider all the possibilities of green and blue coins?

+5

algorithm probability

user2980096 15 sept. '17 at 17:36

source share

3 answers

This answer is incorrect; see Paul Hankin, response to counterexamples and proper analysis. I leave this answer here as an example of learning for all of us.

Assuming that your choice is only to stop collecting coins, you continue until G> B. This part is simple. If you start with G <B, then you never start to draw, and your payoff is 0. For G = B, no strategy will give you a mathematical advantage; gain is also 0.

For the expected reward, do this in two steps:

(1) The expected value for any drawing sequence. Do this recursively by setting the probability of getting green or blue in the first draw, and then the expected values for the new state (G-1, B) or (G, B-1). You will quickly see that the expected value of any given number of fractions (for example, all the possibilities for the 3rd draw) coincides with the original.

Therefore, your expected value for any draw is e = (GB) / (G + B) . The total expected value of e * d , where d is the number of draws you selected.

(2) What is the expected number of draws? How many times do you expect to spend to G = B? I will leave this an exercise for the student, but pay attention to the previous idea of this recursively. It may be easier for you to describe the state of the game as (additional, general), where extra = GB and total = G + B.

Illustrative exercise: Given G = 4, B = 2, what is the likelihood that you will draw GG in the first two draws (and then stop the game)? What is the profit from this? How does this compare with the advantage of (4-2) / (4 + 2) for each draw?

0

Prune 15 sept. '17 at 19:45

source share

Simple and intuitive answer:

you should start by evaluating the total number of blue and green coins. After each selection, you will update this rating. If you calculate that you should stop at any moment, you will see more blue coins than green coins.

Example:

you start and you choose a coin. Its color is green, so you estimate that 100% of the coins are green. You choose blue, so that you value 50% of the coins in green. You choose another blue coin, so you estimate that 33% of the coins are green. At the moment, it is no longer worth playing, according to your assessment, so you stop.

0

Theo walton 15 sept. '17 at 21:39

source share

Paul hankin · Accepted Answer · 2017-09-16T14:50:41+0000

The “obvious” answer is to play when there are greener coins than blue coins. This is actually wrong. For example, if there are 999 green coins and 1000 blue coins, here is a strategy that takes the expected profit:

Take 2 coins If GG -- stop with a profit of 2 if BG or GB -- stop with a profit of 0 if BB -- take all the remaining coins for a profit of -1

Since the first and last possibilities occur with a probability of almost 25%, your overall expectation is approximately 0.25 * 2 - 0.25 * 1 = 0.25

This is just a simple strategy in one extreme example, which shows that the problem is not as simple as it seems at first glance.

In general, expectations with g green coins and blue coins b are given by the recurrence relation:

 E(g, 0) = g E(0, b) = 0 E(g, b) = max(0, g(E(g-1, b) + 1)/(b+g) + b(E(g, b-1) - 1)/(b+g))

The maximum value in the last line occurs because if it is -EV to play, then you better stop.

These recurrence relationships can be solved using dynamic programming in O (gb) time.

 from fractions import Fraction as F def gb(G, B): E = [[F(0, 1)] * (B+1) for _ in xrange(G+1)] for g in xrange(G+1): E[g][0] = F(g, 1) for b in xrange(1, B+1): for g in xrange(1, G+1): E[g][b] = max(0, (g * (E[g-1][b]+1) + b * (E[g][b-1]-1)) * F(1, (b+g))) for row in E: for v in row: print '%5.2f' % v, print print return E[G][B] print gb(8, 10)

Output:

  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.00 1.33 0.67 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.00 2.25 1.50 0.85 0.34 0.00 0.00 0.00 0.00 0.00 0.00 4.00 3.20 2.40 1.66 1.00 0.44 0.07 0.00 0.00 0.00 0.00 5.00 4.17 3.33 2.54 1.79 1.12 0.55 0.15 0.00 0.00 0.00 6.00 5.14 4.29 3.45 2.66 1.91 1.23 0.66 0.23 0.00 0.00 7.00 6.12 5.25 4.39 3.56 2.76 2.01 1.34 0.75 0.30 0.00 8.00 7.11 6.22 5.35 4.49 3.66 2.86 2.11 1.43 0.84 0.36 7793/21879

From this you can see that the expectation is positive for a game with 8 green and 10 blue coins (EV = 7793/21879 ~ = 0.36), and you even have a positive expectation with 2 green and three blue coins (EV = 0, 2)

The algorithm of the optimal expected amount in the game with profit / loss

More articles: