Is there a quick way to fetch from a subset of GLn?

The rules of this problem are quite specific, because I really look at a subset of GLn where the row and column vectors should have a certain shape (call these vectors valid - examples below), so please bear with me. Here are the rules:

  • You can choose a valid nonzero vector of length n uniformly randomly.

  • If you have valid vectors v1, v2, ..., vk you can determine whether the partial columns that they form are prefixes of real vectors (for example, does [v1_1 v2_1 ... vk_1] as the first k entries of a real vector), using the isPrefix function.

  • If you have valid vectors v1, v2, ..., vk, you can determine whether they are linearly dependent using the areIndependent function.

The goal is to selectively select from this subset GLn uniformly randomly. Here is a naive solution that fails:

  Step 1: Select a valid v1 uniformly at random. If isPrefix(v1) then Go to Step 2. Otherwise repeat Step 1. Step 2: Select a valid v2 uniformly at random. If areIndependent(v1,v2) & isPrefix(v1,v2), then go to Step 3. Otherwise, repeat Step 2. ... Step n: Select a valid vn uniformly at random. If areIndependent(v1,v2,...,vn) & isPrefix(v1,v2,...,vn), then END. Otherwise, repeat Step n. 

The problem with this potential solution is that it can get stuck in an infinite loop in the not-so-unlikely case when areIndependent(v1,v2,...,vk) & isPrefix(v1,v2,...,vk) TRUE , but there is no way to complete this k-tuple before a linearly independent real n-tuple. A common example of this is k=n-1 and there is a single real vector vn for which isPrefix(v1,v2,...,vn) has the value TRUE, but this vector does not depend on the previous n-1 vectors.

Therefore, I need to somehow add a backup of a step or two when I am in this loop, but I do not necessarily know when I am in it. I am looking for a solution on these lines: If step k fails f(k) times for some explicit function f , which may depend on the distribution of real vectors, return to step k-1 (or even further, in a certain way).

Any suggestions, comments, links, etc. will be very grateful! Thanks!

EXAMPLES

I'm actually looking for a general recommendation or a recommendation on how to continue the selection. I have some examples of valid vectors on which I would like to do this, and it would be more useful for me to do this on my own, rather than listing each example and having solutions for the SO hash. However, to be specific and give an idea of โ€‹โ€‹the difficulties that are here, here is one example:

Matrices of alternating characters . A vector is valid if its entries are all 0, -1, 1, nonzero entries alternate between 1s and -1s, and the sum of entries is 1. For For example, each permutation matrix will consist of valid vectors, as well as the following:

  0 1 0 1 -1 1 0 1 0 

Great recordings . A vector is valid if it has different entries. This is especially annoying because this condition applies to rows as well as columns.

Thanks again to everyone who looked at this!

+6
source share
1 answer

I suspect that you may have to go over to the Monte Carlo Markov chain algorithm - http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm - not necessarily for speed, but make sure your random samples reasonably distributed.

One definition of random sampling is to produce the same distribution as if you randomly generated matrices from your original column distribution and then only saved the valid ones.

If you create elements from a tree, and not all nodes have the same degree, leaves will not be visited with equal probability. Consider a simple tree with two non-leaf nodes, one of which has a univalent subsidiary, and the other has one million leaf children. If you select, moving down from the root, making an arbitrary choice on each node, the univalent leaf will be visited more often than any particular sheet from a set with a million brothers and sisters.

Since you are stuck in the infinite loop above, you have found a case where there is a node without children. Assuming there are any leaves at all, you have a tree where the nodes do not all have the same degree.

You may need to code different approaches for different rules of reality and worry about how long your Markov chain โ€œlights upโ€ and becomes reasonably random. There is one (kind of) exception from a later point. If you are trying to determine the probability of a tail in order to exclude the possibility that the selected configuration was chosen randomly, you can start your Markov chain from this point, because - with the null hypothesis - this point is already randomly selected, therefore, if all of your generated ones values โ€‹โ€‹have more statistics than this starting point, something suspicious is happening.

+3
source

Source: https://habr.com/ru/post/888282/


All Articles