Random numbers that add to 1 with minimal increment: Matlab

After reading carefully the previous question Random numbers that add to 100: Matlab

I am trying to solve a similar but slightly more complicated problem.

I would like to create an array of n elements that sums with 1, however I want to add a restriction that the minimum gain (or if you like the number of significant digits) for each element is fixed.

For example, if I need 10 numbers that add up to 1 without any restrictions, the following works fine:

num_stocks=10; num_simulations=100000; temp = [zeros(num_simulations,1),sort(rand(num_simulations,num_stocks-1),2),ones(num_simulations,1)]; weights = diff(temp,[],2); 

I foolishly thought that by scaling this, I can add a restriction as follows

  num_stocks=10; min_increment=0.001; num_simulations=100000; scaling=1/min_increment; temp2 = [zeros(num_simulations,1),sort(round(rand(num_simulations,num_stocks-1)*scaling)/scaling,2),ones(num_simulations,1)]; weights2 = diff(temp2,[],2); 

However, although this works for small values ​​of n and small values ​​of the increment, if, for example, n = 1000, and the increment is 0.1%, then with a large number of tests the first and last numbers have an average value that is consistently lower than 0.1% .

I am sure there is a logical explanation / solution for this, but I tore my hair to try to find it, and wondered if someone would be so kind as to point me in the right direction. To put a problem in context, create random stock portfolios (hence the sum of up to 1).

Thanks in advance

Thanks for the answers so far, just to clarify (as I think my initial question may have been poorly worded), these are weights with a fixed increment of 0.1%, so 0%, 0.1%, 0.2 % etc.

I tried using integers first

  num_stocks=1000; min_increment=0.001; num_simulations=100000; scaling=1/min_increment; temp = [zeros(num_simulations,1),sort(randi([0 scaling],num_simulations,num_stocks-1),2),ones(num_simulations,1)*scaling]; weights = (diff(temp,[],2)/scaling); test=mean(weights); 

but it was worse, the average value for the first and last weights is much lower than 0.1% .....

Modify to reflect Floris' excellent answer and clarify

The source code I used to solve this problem (before finding this forum) was

 function x = monkey_weights_original(simulations,stocks) stockmatrix=1:stocks; base_weight=1/stocks; r=randi(stocks,stocks,simulations); x=histc(r,stockmatrix)*base_weight; end 

This works very fast, which is important, considering that I want to run just 10,000,000 simulations, 10,000 simulations per 1,000 shares take just over 2 seconds with one core, and I run all the code on an 8-core computer using the parallel toolbar .

It also gives exactly the distribution that I was looking for in terms of funds, and I think it is also likely to get a portfolio that is 100% in 1 share, since it is intended for a portfolio that is 0.1% in each (although I'm glad you got it fixed).

My problem is that although it works for 1000 shares and increases by 0.1%, and I think that it works for 100 shares and an increase of 1%, since the number of shares decreases, then each choice becomes a very large percentage (as a last resort with 2 shares you will always get a 50/50 portfolio).

Essentially, I think this solution is similar to the binomial solution proposed by Floris (but more limited)

However, my question arose because I would like to make my approach more flexible and be able to say 3 shares and a 1% gain that my current code will not handle correctly, therefore, how I came across the original StackOverflow question

Floris's recursive approach will come up with the right answer, but speed will be a serious problem, given the scale of the problem.

An example of an original study is here.

http://www.huffingtonpost.com/2013/04/05/monkeys-stocks-study_n_3021285.html

I am currently working on expanding it with more flexibility regarding the weight of the portfolio and the number of shares in the index, but it seems that my programming and the ability of probability theory are the limiting factor .......

+4
source share
3 answers

In the end, I solved this problem!

I found an article by two scientists from Johns Hopkins University, "Sampling one by one from a simple element" http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf

In the article, they describe how naive algortmas do not work, as well as wood chips, answer random numbers that add to 100 questions. They further show that the method proposed by David Schwartz may also be slightly biased and offers a modified algorithm that seems to work.

If you want x numbers that add up with y

  • Sample evenly x-1 random numbers from range 1 to x + y-1 without replacement
  • Sort them
  • Add zero at the beginning and x + y at the end
  • separate them and subtract 1 from each value
  • If you want to scale them like me, then divide by y

It took me a while to figure out why this works when the initial approach is not suitable, and it comes down to the probability of getting zero weight (as Floris emphasized in his answer). In order to get zero weight in the original version for all but the 1st or last weight, your random numbers should have 2 values ​​the same, but for the 1st and last one a random number from zero or the maximum number will result in zero weight which is more probably. In the revised algorithm, zero and the maximum number are not included in the set of random options, and zero weight occurs only when two consecutive numbers are selected that are equally likely for each position.

I encoded it in matlab as follows

 function weights = unbiased_monkey_weights(num_simulations,num_stocks,min_increment) scaling=1/min_increment; sample=NaN(num_simulations,num_stocks-1); for i=1:num_simulations allcomb=randperm(scaling+num_stocks-1); sample(i,:)=allcomb(1:num_stocks-1); end temp = [zeros(num_simulations,1),sort(sample,2),ones(num_simulations,1)*(scaling+num_stocks)]; weights = (diff(temp,[],2)-1)/scaling; end 

Obviously, the loop is a bit awkward, and since I'm using the 2009 version, the randperm function allows me to generate permutations of the entire set, however, despite this, I can run 10,000 simulations with 1000 numbers in 5 seconds on my awkward laptop that is fast enough .

Now the average weights are correct and as a quick test I replicated wood chips generating 3 numbers that add up to 1 with a minimum increment of 0.01% and also look correct.

enter image description here

Thanks to everyone for your help, and I hope that this solution will be useful to someone else in the future.

+2
source

One of the problems I see is that your formula allows the numbers to be zero - when the rounding operation results in two consecutive numbers, it will be the same after sorting. I’m not sure what you think is the problem - but I suggest you think about it (this will mean that your portfolio of models has less than N shares in it, since the contribution of one of the shares will be zero).

Another thing is that the probability of getting extreme values ​​in your distribution is half what you want them to be: if you have evenly distributed numbers from 0 to 1000, and you round them, the numbers that round to 0 were in the range [0 0.5> ; those that were rounded to 1 came from [0.5 1.5> - twice as much. The last number (rounding to 1000 ) again comes from a smaller interval: [999.5 1000] . Thus, you will not get the first and last number as often as you think. If instead of round you use floor , I think you will get the answer you expect.

EDIT

I thought about it a little more and came up with a slow but (I think) exact method for this. The basic idea is this:

  • Think of integers; instead of dividing the interval 0 - 1 in increments of 0.001, divide the interval 0 - 1000 into whole steps.
  • If we try to divide N into m intervals, the average step size should be N / m; but as an integer, we expect the intervals to be binomially distributed.
  • This offers an algorithm in which we select the first interval as a binomially distributed indicator with an average value (N/m) - call the first value v1 ; then divide the remaining interval N - v1 into steps m-1 ; we can do it recursively.

The following code implements the following:

 % random integers adding up to a definite sum function r = randomInt(n, limit) % returns an array of n random integers % whose sum is limit % calls itself recursively; slow but accurate if n>1 v = binomialRandom(limit, 1 / n); r = [v randomInt(n-1, limit - v)]; else r = limit; end function b = binomialRandom(N, p) b = sum(rand(1,N)<p); % slow but direct 

To get 10,000 instances, you run it like this:

 tic portfolio = zeros(10000, 10); for ii = 1:10000 portfolio(ii,:) = randomInt(10, 1000); end toc 

This was done in 3.8 seconds on a modest machine (one thread) - of course, the method of obtaining a twice-distributed random deviation is to slow down its work; There are statistical boxes with more efficient functions, but I don’t have them. If you increase the granularity (for example, by setting limit=10000 ), it will slow down more, since you increase the number of samples of random numbers that are generated; with limit = 10000 cycle above took 13.3 seconds.

As a test, I found mean(portfolio)' and std(portfolio)' as follows (with limit=1000 ):

 100.20 9.446 99.90 9.547 100.09 9.456 100.00 9.548 100.01 9.356 100.00 9.484 99.69 9.639 100.06 9.493 99.94 9.599 100.11 9.453 

This seems like a pretty convincing β€œflat” distribution to me. We expect the numbers to be binomially distributed with an average of 100 and a standard deviation of sqrt(p*(1-p)*n) . In this case, p=0.1 , so we expect s = 9.4868 . The values ​​that I really got were again very close.

I understand that this is ineffective with large limit values, and I have not attempted efficiency. I find clarity outperforms when you develop something new. But, for example, you could pre-calculate the cumulative binomial distributions for p=1./(1:10) , then do a random search; but if you are only going to do it once, then for 100,000 copies it will work in less than a minute; if you are not going to do this many times, I would not bother. But if someone wants to improve this code, I would be glad to hear from them.

+2
source

The simple answer is to use circuits that work well with a minimal increment of NO, and then convert the problem. As always, be careful. Some methods do not give uniform sets of numbers.

Thus, suppose I want 11 numbers to be summed to 100 with a minimum increment of 5 restriction. First, I will find 11 numbers that add up to 45, with no lower bound on the samples (except zero). I can use the file sharing tool for this. The easiest way is simply to try 10 numbers in the interval [0.45]. Sort them, then find the differences.

 X = diff([0,sort(rand(1,10)),1]*45); 

Vector X is a pattern of numbers that adds up to 45. But vector Y adds up to 100 with a minimum value of 5.

 Y = X + 5; 

Of course, this is trivially vectorized if you want to find multiple sets of numbers with a given limit.

+1
source

Source: https://habr.com/ru/post/1492986/


All Articles