How to generate a pseudo-random positive definite matrix with restrictions on off-diagonal elements?

Possible duplicate:
how to create a pseudo-random positive definite matrix with restrictions on off-diagonal elements?

The user wants to impose a unique, nontrivial upper / lower correlation boundary between each pair of variables in var / covar matrix.

For example: I need a variance matrix in which all the variables are 0.9 > |rho(x_i,x_j)| > 0.6 0.9 > |rho(x_i,x_j)| > 0.6 , rho(x_i,x_j) are the correlation between the variables x_i and x_j .

Thanks.

+3
source share
3 answers

There are a lot of problems here.

First of all, are pseudo-random deviations the supposed normal distributions? I assume that they are, since any discussion of correlation matrices becomes unpleasant if we disagree in the abnormal distributions.

Further, it is quite simple to create pseudo-random normal deviations, given the covariance matrix. Generate standard normal (independent) deviations, and then convert, multiplying by the Cholesky coefficient of the covariance matrix. Add to the average at the end if the average is not zero.

And, the covariance matrix is ​​also quite simple to create, given the correlation matrix. Just pre and post multiply the correlation matrix by a diagonal matrix consisting of standard deviations. This scales the correlation matrix into a covariance matrix.

I'm still not sure where the problem is in this matter, since it would seem easy enough to create a “random” correlation matrix with elements uniformly distributed in the desired range.

Thus, all of the above is quite trivial by any reasonable standards, and there are many tools for generating pseudo-random normal deviations, given the above information.

Perhaps the problem is that the user insists that the resulting random deviation matrix should have correlations in the specified range. You must recognize that a set of random numbers will only have the necessary distribution parameters in the asymptotic sense. Thus, since the sample size reaches infinity, you should expect to see the specified distribution parameters. But any small set of samples will not necessarily have the required parameters in the desired ranges.

For example, (in MATLAB) there is a simple positive definite 3x3 matrix here. Thus, it creates a very good covariance matrix.

 S = randn(3); S = S'*S S = 0.78863 0.01123 -0.27879 0.01123 4.9316 3.5732 -0.27879 3.5732 2.7872 

I convert S to a correlation matrix.

 s = sqrt(diag(S)); C = diag(1./s)*S*diag(1./s) C = 1 0.0056945 -0.18804 0.0056945 1 0.96377 -0.18804 0.96377 1 

Now I can choose from a regular distribution using the statistics toolkit (mvnrnd should do the trick.) How easy it is to use the Cholesky factor.

 L = chol(S) L = 0.88805 0.012646 -0.31394 0 2.2207 1.6108 0 0 0.30643 

Now create pseudo-random deviations, then transform them as desired.

 X = randn(20,3)*L; cov(X) ans = 0.79069 -0.14297 -0.45032 -0.14297 6.0607 4.5459 -0.45032 4.5459 3.6549 corr(X) ans = 1 -0.06531 -0.2649 -0.06531 1 0.96587 -0.2649 0.96587 1 

If your desire was that the correlations should ALWAYS be greater than -0.188, then this sampling method failed because the numbers are pseudo-random. In fact, this goal will be challenging if your sample size is not large enough.

You can use a simple deviation scheme by which you sample, and then repeat it repeatedly until the sample has the required properties, with correlations in the required ranges. It can get tired.

An approach that may work (but one that I haven’t completely conceived at this stage) is to use the standard scheme, as described above, to create a random sample. Calculate the correlations. I cannot lie in the proper ranges and then determine the perturbation that would have to be added to the actual (measured) covariance matrix of your data, so the correlations will be as we would like. Now find the zero mean random perturbation for your sample data that will move the covariance sample matrix in the desired direction.

It might work, but if I don’t know what this is really a question, I won’t go into it. (Edit: I thought about this problem even more, and it seems that the problem of quadratic programming with quadratic constraints finds the least perturbation for matrix X, so the resulting covariance (or correlation) matrix has the desired properties.)

+4
source

This is not a complete answer, but a suggestion of a possible constructive method:

Looking at the characteristics of positively defined matrices ( http://en.wikipedia.org/wiki/Positive-definite_matrix ) I think one of the most affordable approaches could be to use the Sylvester criterion.

You can start with a trivial random 1x1 matrix with a positive determinant and gradually expand it into one row and column, ensuring that the new matrix also has a positive determinant (how to achieve this depends on you ^ _ ^).

0
source

Woodship,

“First of all, are pseudo-random deviations normally distributed?”

Yes.

"Perhaps the problem is that the user insists that the resulting random deviation matrix should have correlations in the specified range."

Yes that's all the difficulty

"You have to admit that a set of random numbers will only have the desired distribution parameters in the asymptotic sense."

True, but this is not the problem: your strategy works on p = 2, but it does not work for p> 2, regardless of the size of the sample.

"If your desire was for correlations to ALWAYS be above -0.188, then this sampling method failed because the numbers are pseudo-random. In fact, this goal will be difficult if you do not reach the sample size large enough."

This is not a b / c sample size problem with p> 2, you don’t even see convergence to the correct range for correlations, because the sample size is growing: I tried the technique that you offer, before publishing here, it obviously has drawbacks.

“You can use a simple deviation scheme that you sample, and then repeat it until the sample has the desired properties, with correlations in the required ranges. It can get tired.”

Not an option, for p large (say, more than 10) this parameter is unbearable.

"Calculate the correlations: I cannot lie in the correct ranges, and then determine the perturbation that would have to be added to the actual (measured) covariance matrix of your data so that the correlations are just as necessary."

Also

Regarding QP, I understand the limitations, but I'm not sure how you define the objective function; using the "least disturbance" from some initial matrix, you will always get the same matrix (solution): all diagonal entries will be equal to either one of two boundaries (for example, not pseudo-randomly); plus it's kind of overkill, isn't it?

Come on people, something must be easier

0
source

Source: https://habr.com/ru/post/1286708/


All Articles