How to generate a pseudo-random positive definite matrix with restrictions on off-diagonal elements?

The user wants to impose a unique, nontrivial upper / lower correlation boundary between each pair of variables in var / covar matrix.

For example: I need a variance matrix in which all variables have a value of 0.9> | rho (x_i, x_j) | > 0.6, rho (x_i, x_j) is the correlation between the variables x_i and x_j.

Thanks.


Well, something like a quick and dirty solution was found, but if someone knows a more accurate way to get there, then he will be happy.


I lost my original login, so I am rewriting the question under a new name. Previous iteration got the following answer

* you mean pseudo-random, that the correct terminology for the semi-random is Robert Gould

* A good point, but I think he had in mind a semi-random number (it is considered pseudo when it comes to computer randomness: -p) - fortran

* With "correlation", do you mean "covariance"? - Svante

* No, I really mean correlation. I want to create a positive definite matrix so that all correlations are tougher than trivial boundaries. - vak

* See my answer. Do you insist that sample correlations lie within the specified boundaries or only correlations of the population that generate the sample? I suggest an idea that might work if your problem is first. - wood chips

* woodship: no I am afraid that your solution will not work, see my answer in the original threat (link above). Thanks.

+4
source share
4 answers

Here is your answer to my answer in the source code:

"Come on, people should be something simpler."

Sorry but no. Desire to win the lottery is not enough. Requiring newcomers to win the series is not enough. You also cannot demand a solution to a mathematical problem, and suddenly you find it easy.

The problem of generating pseudo-random deviations with sample parameters in a given range is nontrivial, at least if the deviations must be truly pseudo-random in any sense. Depending on the range, maybe lucky. I proposed a deviation scheme, but also stated that this is unlikely to be a good solution. If there are many dimensions and limited ranges in correlations, the probability of success is unsatisfactory. Sample size is also important, as this will result in sample variance of the resulting correlations.

If you really want a solution, you need to sit down and clearly indicate the goal. You need a random sample with a nominal specified correlation structure, but strict correlation limits? Will there be a satisfactory sample correlation matrix that satisfies the goal score? Are deviations also given?

+2
source

You can create a set of N random vectors of size M and unit variance. And add to them a random vector (size N and unit variance) multiplied by some number k. Then you take the correlation between all these vectors, which will be a positive definite matrix. If M is very large, then there will be no variance in the correlation distribution, and the correlation will be: k ^ 2 / (1 + k ^ 2). The smaller M, the wider the distribution of diagonal elements. Alternatively, you can let M be very large and multiply the "common vector" by different k each. If you play correctly with these parameters, you can get tighter control. Here is the Matlab code for this:

clear all; vecLarg=10; theDim=1000; corrDist=0*randn(theDim,1); Baux=randn(vecLarg,theDim)+ (corrDist*randn(1,vecLarg))'+(k*ones(theDim,1)*randn(1,vecLarg))' ; A=corrcoef(Baux); hist(A(:),100); 
+2
source

Perhaps this answer will help to implement it:

One class of matrices with this non-negative definiteness property is Wishart Distribution . And the samples from ~ W () that all off-diagonal entries are between some boundaries [l, u] will fit your question. However, I do not believe that this is the same as the distribution of all positive definite matrices with off-diagonals in [1, u].

The wikipedia page has a calculation algorithm from ~ W ().

A simpler, hacky solution (perhaps approximating this):

(given that u> l and l> 0)

  • extracted from the multidimensional normal, where Sigma = mean (l, u).
  • Then, taking the sample, calculating its correlation matrix => C
  • This matrix will have some fuzz, but the math about how much it will have is a bit out of my league for me to calculate. The off-diags values ​​in this matrix C are bounded [-1,1], with an average value (l, u). On the eyeball, I guess some kind of beta / exponent. In any case, that the continuous distribution of off diags in C ensures that it will not behave and lies within the boundaries of (l, u), unless (l, u) = [-1,1].
  • You can adjust the amount of β€œfluff” by increasing / decreasing the sample length in step 1. I would set (not prove) that the variance in C odd-diags is proportional to the square root of the number of samples.

So, it seems nontrivial to really answer!

As the other posters suggested, you can generate from Wishart, and then save the ones you want, it's true, but you can select the selection for a long time! If you exclude those that are 0-defined (is that a word?), Then this should work just fine to create good matrices. However, this is not the true distribution of all pos-def matrices whose off-diags are in [l, u].

Code (in R) for the proposed silent sampling scheme

 sigma1 <- function(n,sigma) { out <- matrix(sigma,n,n) diag(out) <- 1 return (out) } library(mvtnorm) sample_around_sigma <- function(size, upper,lower, tight=500) { # size: size of matrix # upper, lower: bounds on the corr, should be > 0 # tight: number of samples to use. ideally this # would be calcuated such that the odd-diags will # be "pretty likely" to fall in [lower,upper] sigma <- sigma1(size,mean(c(upper,lower))) means <- 0*1:size samples <- rmvnorm(n=tight, mean=means,sigma=sigma) return (cor(samples)) } > A <- sample_around_sigma(5, .3,.5) > A [,1] [,2] [,3] [,4] [,5] [1,] 1.0000000 0.3806354 0.3878336 0.3926565 0.4080125 [2,] 0.3806354 1.0000000 0.4028188 0.4366342 0.3801593 [3,] 0.3878336 0.4028188 1.0000000 0.4085453 0.3814716 [4,] 0.3926565 0.4366342 0.4085453 1.0000000 0.3677547 [5,] 0.4080125 0.3801593 0.3814716 0.3677547 1.0000000 > > summary(A[lower.tri(A)]); var(A[lower.tri(A)]) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3678 0.3808 0.3902 0.3947 0.4067 0.4366 [1] 0.0003949876 
+1
source

OK, fantastic Gregg: we get somewhere. Combining your idea with the idea of ​​wood chips gives such an alternative approach. This is mathematically very dirty, but it seems to work:

 library(MCMCpack) library(MASS) p<-10 lb<-.6 ub<-.8 zupa<-function(theta){ ac<-matrix(theta,p,p) fe<-rwish(100*p,ac%*%t(ac)) det(fe) } ba<-optim(runif(p^2,-10,-5),zupa,control=list(maxit=10)) ac<-matrix(ba$par,p,p) fe<-rwish(100*p,ac%*%t(ac)) me<-mvrnorm(p+1,rep(0,p),fe) A<-cor(me) bofi<-sqrt(diag(var(me)))%*%t(sqrt((diag(var(me))))) va<-A[lower.tri(A)] l1=100 while(l1>0){ r1<-which(va>ub) l1<-length(r1) va[r1]<-va[r1]*.9 } A[lower.tri(A)]<-va A[upper.tri(A)]<-va vari<-bofi*A mk<-mvrnorm(10*p,rep(0,p),vari) pc<-sign(runif(p,-1,1)) mf<-sweep(mk,2,pc,"*") B<-cor(mf) summary(abs(B[lower.tri(B)])) 

In principle, this is an idea (say, the upper bound = .8 and below its bound = .6), it has a fairly good adoption rate, which is not 100%, but it will be at this stage of the project,

+1
source

Source: https://habr.com/ru/post/1286707/


All Articles