Drawing a random nonzero element from a sparse matrix

I have a sparse logic matrix that is quite large. I would like to extract random non-zero elements from it without saving all non-zero elements in a separate vector (for example, using the find command). Is there an easy way to do this?

I am currently implementing a reject selection that draws a random element and checks if it is nonzero or not. But this is not effective when the ratio of nonzero elements is small.

+4
source share
4 answers

A rare logic matrix is ​​not a very practical representation of your data if you want to choose random locations. Sampling and find are just two ways that make sense to me. Here you can do it efficiently (assuming you want 4 random locations):

 %# using find idx = find(S); %# draw 4 without replacement fourRandomIdx = idx(randperm(length(idx),4)); %# draw 4 with replacement fourRandomIdx = idx(randi(1,length(idx),4)); %# get row, column values [row,col] = ind2sub(size(S),fourRandomIdx); %# using rejection sampling density = nnz(S)/prod(size(S)); %# estimate how many samples you need to get at least 4 hits %# and multiply by 2 (or 3) n = ceil( 1 / (1-(1-density)^4) ) * 2; %# random indices w/ replacement randIdx = randi(1,n,prod(size(S))); %# identify the first four non-zero elements [row,col] = find(S(randIdx),4,'first'); 
+1
source

The nxm matrix with nnz nonzero entries requires nnz + n + 1 integers to store the locations of its nonzero entries. For a logical matrix, there is no need to store the value of nonzero entries: they are all true. Accordingly, it is best for you to convert your logical sparse matrix into a list of linear indices of nonzero entries along with n and m, which only requires nnz + 2 storage integers. Of these (and ind2sub), you can easily restore indexes corresponding to any non-zero record that you randomly select using randi in the range 1..nnz

+1
source

find is a standard interface for getting nonzero elements in a sparse matrix. Have a look here http://www.mathworks.se/help/techdoc/math/f6-9182.html#f6-13040

 [i,j,s] = find(S) 

find returns row indices of nonzero values ​​in vector i, column indices in vector j, and nonzero values ​​themselves in vector s.

No need to get s. Just select a random index in i, j.

0
source

Presenting entries in the format of three columns, as well as a list of coordinates (i, j, value), you can simply select elements from the list. To get this, you can either use your original method to create a sparse matrix (that is, the sparse() predecessor), or use the find , a la [i,j,s] = find(S);

If you don't need records, and it seems you won't, you can simply extract i and j .

If for some reason your matrix is ​​massive and the RAM restrictions are severe, you can simply divide the matrix into regions and allow the probability of choosing this submatrix to be proportional to the number of nonzero elements (using nnz ) in this submatrix. You can go so far as to divide the matrix into separate columns, and the rest of the calculation is trivial. NB: by applying sum to the matrix, you can get the number of columns (assuming your records are 1 s).

This way, you don’t even have to worry about selective sampling (this seems senseless to me in this case, since Matlab knows where all the non-zero entries are).

0
source

Source: https://habr.com/ru/post/1396533/


All Articles