Generate random weighted value

Edit: I rewrote the question in the hope that the goal is a little understandable.

This is an extended question on this question here , and I really like the feature presented in this answer .

In the above answer, you can establish the probability of going to extremes, with higher numbers giving a higher probability of getting lower numbers and vice versa. The problem is that I have to establish the probabilities for the three groups. These groups are the lowest value (LV), maximum value (HV) and average value (MV). However, to simplify the request, we can consider EVP=HVP=LVP .

For any range, HV / LV should appear based on the indicated EVP, and as you progress / overcome the range from each extreme, the probability of the next value in the range will increase or decrease based on the distance between EVP and MVP.

Using an approximate range of 1-6, with 1 and 6 weighted at 5% (EVP), the probability of propagation will be 1/6 is 5%, 2/4 is 15%, and 3/4 is 30% (MVP), only 100% . The converse should also be possible; replacing EVP and MVP should lead to the reverse schedule below.

Here's an image that I hope will show the expected results from this example.

Average weight:

Middle Weighted Graph

Bonus: It would be great if I could install HVP and LVP separately, creating a result similar to the graph below (Note: the graph does not meet the specification above).

Weighted Average (bonus):

Middle Weighted Bonus Graph

Thank!

+10
c # php random
Oct 27 '10 at 6:12
source share
6 answers

Since today I got stuck at home because of the flu :( I decided to try and figure it out for you. Basically, you are asking for some kind of interpolation. I used the simplest (linear) results and my code. The code is pretty dirty, and I can fix it in the coming days.

 <?php // this function interpolates $a to $b over $steps steps, starting from key $k // this can be cleaned up significantly function interpolate($a, $b, $steps, $k) { @$per_step = abs($a - $b)/$steps; // suppress warnings in case of division by zero if ($a > $b) $decreasing = true; else $decreasing = false; $final = array(); for ($i = 1; $i <= $steps-1; ++$i) { if ($decreasing) $final[$i+$k] = $a-=$per_step; // linear interpolation else $final[$i+$k] = $a+=$per_step; // linear interpolation } return $final; } // this function combines probability arrays after the interpolation occurs // this may happen multiple times, think about 1, 3, 5. interpolation would have to occur // from 1 -> 2 -> 3, and from 3 -> 4 -> 5. function interpolateProbabilities ($nodes) { $pNodes = array(); $pNodes = $nodes; $keys = array_keys($nodes); for ($i = 0; $i < count($keys); $i++) { if ($keys[$i+1] - $keys[$i] != 1) { $pNodes += interpolate($nodes[$keys[$i]], $nodes[$keys[$i+1]], $keys[$i+1] - $keys[$i], $keys[$i]); } } ksort($pNodes); return $pNodes; } // this generates a weighed random value and is pretty much copy-pasted from: // http://w-shadow.com/blog/2008/12/10/fast-weighted-random-choice-in-php/ // it robust and re-writing it would be somewhat pointless function generateWeighedRandomValue($nodes) { $weights = array_values($nodes); $values = array_keys($nodes); $count = count($values); $i = 0; $n = 0; $num = mt_rand(0, array_sum($weights)); while($i < $count) { $n += $weights[$i]; if($n >= $num) { break; } $i++; } return $values[$i]; } // two test cases $nodes = array( 1 => 12, 5 => 22, 9 => 31, 10 => 35); // test 1 $nodes = array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10); // test 2 $export = array(); // run it 1000 times for ($i = 0; $i < 1000; ++$i) { $export[generateWeighedRandomValue(interpolateProbabilities($nodes))]++; } // for copy-pasting into excel to test out distribution print_r($export); ?> 

The results, I think, are exactly what you are looking for. When:

 $nodes = array( 1 => 12, 5 => 22, 9 => 31, 10 => 35); // test 1 

I got the following (final) array:

 Array ( [5] => 92 [7] => 94 [10] => 162 [8] => 140 [3] => 71 [6] => 114 [2] => 75 [4] => 69 [9] => 131 [1] => 52 ) 

Namely, 1 should occur 12% of the time, 5 22%, 9 31% and 10 35% of the time. Let's draw it: graph 1

It looks promising, but lets try something more crazy ...

 $nodes = array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10); // test 2 

In this case, 3 should occur in 50% of cases and decrease steeply to 6 . Let's see what happens! This is an array (in retrospect I had to sort these arrays):

 Array ( [4] => 163 [7] => 64 [2] => 180 [10] => 47 [1] => 115 [5] => 81 [3] => 227 [8] => 57 [6] => 6 [9] => 60 ) 

And let's look at the image:

alt text

It looks like it works :)

Hope I was able to solve your problem (or at least point you in the right direction). Please note that my code currently has a number of conditions. Namely, the source nodes that you provide MUST have probabilities that are up to 100%, or you may get some uncomfortable behavior.

Also, the code is pretty dirty, but the concepts are relatively simple. Some other interesting things would be to try using a different view instead of using linear interpolation, which will give you more interesting results!




Algorithm

To avoid confusion, I will just show how the algorithm works. I give PHP a $node array that is in the form of integer => frequency in percentage and ends up looking like an array( 1 => 22, 3 => 50, 6 => 2, 7 => 16, 10 => 10) , which is test 2 on top.

test 2 basically says that you want 5 control nodes to fit in 1, 3, 6, 7, and 10 with frequencies of 22%, 50%, 2%, 16%, and 10% respectively. First, I need to see exactly where I need to do the interpolation. For example, I do not need to do this between 6 and 7 , but I do need to do this between 1 and 3 (we need to interpolate 2 ) and 7 and 10 (where we need to interpolate 8 and 9 ).

Interpolation between 1 -> 3 has steps (3 - 1) - 1 = 1 and should be inserted into key[2] in the original array. The value ( % ) for interpolation 1 -> 3 is abs($a - $b) / $steps , which corresponds to the absolute value % of 1 minus % of 2 divided by steps + 1 , which in our case turns out to be 14 . We need to see if the function (hello Calculus) is increasing or decreasing. If the function grows, we keep adding the % step to the new interpolation array until we fill all our empty spaces (if the function decreases, we subtract the % value step. We only need to fill one place, we return 2 => 36 ( 22 + 14 = 36 ).

Combine the arrays, and the result is (1 => 22, 2 => 36, 3 => 50, 6 => 2, 7 => 16, 10 => 10) . The program interpolated 2 , which was a percentage value that we did not explicitly declare.

In the case of 7 -> 10 there are 2 steps, the percentage of step 2 , which comes from (16-10) / (3 + 1) = 2 . The function decreases, so we need to subtract 2 several times. The final interpolated array (8 => 14, 9 => 12) . We combine all arrays and voila.

The following figure shows green (initial values) and red (interpolated values). You may need to “view the image” to see all of this clearly. You will notice that I am using ± , as the algorithm must figure out whether we should increase or decrease over a certain period.

alt text




This code should probably be written in a more OOP paradigm. I play a lot with array keys (for example, I need to pass $k to make it easier to combine arrays when I return them from interpolate($a, $b, $steps, $k) , because they automatically have the correct keys. This just a PHP feature and a retrospective. Probably I should have started with a more understandable OOP approach.




This is my last change, I promise :) Since I like playing with Excel, it shows how percentages normalize after interpolating numbers. This is important to see, especially considering that in your first picture what you show is some mathematical impossibility.

Test 1 alt text test 2 alt text

You will notice that percentages are significantly reduced when interpolation is taken into account. Your second chart would actually look like this:

alt text

In this graph, I weighed 1 = > 1, 5 => 98, 10 => 1 , and you see the extremes of the moisturizing effect. In the end, interest, by definition, should be up to 100! It is simply important to understand that the effect of hydration is directly proportional to the number of steps between extremes.

+17
Oct 31 '10 at 8:27
source share

Assuming you can handle integer numbers for percentages, simply assign each value between 0 and 99 a result - for example. 0-9 can have a result of 1 and 95-99 can have a result of 6 (to give your 10% = 1 and 5% = 6 scenario). When you have this translation function (however you achieve this - there are various approaches that you can use) you just need to create a random number in the range 0-99 and translate it into the result.

Your question is not entirely clear from the point of view of the code you want (or even in which language - C # or PHP?), But I hope this helps.

Here is some C # code that will allow you to get any offset you like within reason - you don't need to express it as a percentage, but you can do:

 static int BiasedRandom(Random rng, params int[] chances) { int sum = chances.Sum(); int roll = rng.Next(sum); for (int i = 0; i < chances.Length - 1; i++) { if (roll < chances[i]) { return i; } roll -= chances[i]; } return chances.Length - 1; } 

So for example you can use

 int roll = BiasedRandom(rng, 10, 10, 10, 10, 10, 50) + 1; 

which will give a 10% chance for each of 1-5 and a 50% chance to get 6.

+4
Oct 27 '10 at 6:22
source share

Quick and dirty way in C #:

 T PickWeightedRandom<T>(IEnumerable<Tuple<T,double>> items, Random r) { var sum = 0.0; var rand = r.NextDouble(); return items.First(x => { sum += x.Item2; return rand < sum; }).Item1; } 

Test code:

 var values = new [] { Tuple.Create(1, 0.05), Tuple.Create(2, 0.15), Tuple.Create(3, 0.3), Tuple.Create(4, 0.3), Tuple.Create(5, 0.15), Tuple.Create(6, 0.05), }; const int iterations = 1000; var counts = new int[values.Length]; var random = new Random(); for (int i = 0; i < iterations; i++) { counts[PickWeightedRandom(values, random)-1]++; } foreach (var item in counts) { Console.WriteLine(item/(double)iterations); } 

Output (with iterations = 1,000,000):

 0.050224 0.150137 0.300592 0.298879 0.150441 0.049727 

Looks like:

+2
Nov 01 '10 at 4:52
source share

A common technique in generating a heterogeneous random number is using sample rejection . Although it may be ineffective in this case, you still need to know how to do it, because it works for any density function that you provide.

 function random($density, $max) { do { $rand = lcg_value(); $rand2 = lcg_value() * $max; } while ($density($rand) < $rand2); return $rand; } 

$density here is a density function that takes a floating-point number between zero and one as an argument and returns a value less than $max . For your example, this density function might be:

 $density = function($x) { static $values = array( 1 => 0.05, 2 => 0.15, 3 => 0.30, 4 => 0.30, 5 => 0.15, 6 => 0.05, ); return $values[ceil($x * 6)]; }; 

An example call would then be:

 ceil(random($density, 0.3) * 6); // 0.3 is the greatest value returned by $density // round and * 6 are used to map a 0 - 1 float to a 1 - 6 int. 

Sampling is especially useful if you cannot easily calculate the inverse distribution. As in this case, it is quite simple to calculate the reverse use of the inverse transform sample , probably the best choice. But this is already described in the Jon answer .

PS: the implementation above is generic and therefore uses a random value from 0 to 1. Creating a function that works only for your approach makes things simpler:

 function random() { static $values = array( 1 => 0.05, 2 => 0.15, 3 => 0.30, 4 => 0.30, 5 => 0.15, 6 => 0.05, ); do { $rand = mt_rand(1, 6); $rand2 = lcg_value() * 0.3; } while ($values[$rand] < $rand2); return $rand; } random(); 
+1
Oct 31 '10 at 8:19
source share

First you need to characterize your random number generator. In the case of PHP, the rand () function returns a nice flat profile - so there is no preprocessing.

Redo the output distribution function so that the area below it is one and the range starts from zero. Then calculate its integral. Store the integral (e.g. as an array of values). Then, when you need a random number matchnig profile, first get a random number from 0 to 1 from the built-in generator, then find the Y coordinate in the integral, where the X coordinate is the value you created. Finally, scale the value to the desired range (for example, if you are looking for a value from 0 to 10, multiply by 10, if you are looking for a value from -8 to +8, take 16 and subtract 8).

If your random number generator does not create a flat profile, then the easiest approach would be to convert it to a flat profile using the inverse method above.

0
Oct 27 '10 at 11:28
source share

I haven't tried it, but I think this might work:

 $random($probability) { $rnd = rand() / getrandmax(); foreach($probability as $num => $prob) { $rnd -= $prob; if($rnd <=0) return $num; } return -1; //this should never happen } 

And name it like this (using the second example):

 $distribution = array( 1 => 0.10, 2 => 0.15, 3 => 0.30, 4 => 0.27, 5 => 0.14, 6 => 0.04); $number = random($distribution); 
0
Nov 01 '10 at 8:33
source share



All Articles