How do you edit cells in a sparse matrix using scipy?

I am trying to manipulate some data in a sparse matrix. Once I created it, how do I add / change / update the values ​​in it? It seems very simple, but I cannot find it in the documentation for sparse matrix classes or on the Internet. I think I am missing something important.

This is my unsuccessful attempt to do this in the same way as a regular array.

>>> from scipy.sparse import bsr_matrix >>> A = bsr_matrix((10,10)) >>> A[5][7] = 6 Traceback (most recent call last): File "<pyshell#11>", line 1, in <module> A[5][7] = 6 File "C:\Python27\lib\site-packages\scipy\sparse\bsr.py", line 296, in __getitem__ raise NotImplementedError NotImplementedError 
+6
source share
2 answers

There are several sparse matrix formats. Some of them are better suited for indexing. One of them implemented this lil_matrix .

 Al = A.tolil() Al[5,7] = 6 # the normal 2d matrix indexing notation print Al print Al.A # aka Al.todense() A1 = Al.tobsr() # if it must be in bsr format 

The documentation for each format tells you what is good and where is bad. But he does not have a clear list of which are defined.

 Advantages of the LIL format supports flexible slicing changes to the matrix sparsity structure are efficient ... Intended Usage LIL is a convenient format for constructing sparse matrices ... 

dok_matrix also implements indexing.

The basic data structure for coo_matrix easy to understand. These are essentially parameters for determining coo_matrix((data, (i, j)), [shape=(M, N)]) . To create the same matrix, you can use:

 sparse.coo_matrix(([6],([5],[7])), shape=(10,10)) 

If you have more assignments, create larger data , i , j lists (or 1d arrays), and at the end, build a sparse matrix.

+5
source

The documentation for bsr is here for the bsr matrix , and for csr here is the csr matrix . It may be worthwhile to understand csr before moving on to bsr. The only difference is that bsr has entries that are the matrices themselves, while the base unit in csr is a scalar.

I don’t know if there are super easy ways to manipulate matrices after they are created, but here are some examples of what you are trying to do,

 import numpy as np from scipy.sparse import bsr_matrix, csr_matrix row = np.array( [5] ) col = np.array( [7] ) data = np.array( [6] ) A = csr_matrix( (data,(row,col)) ) 

This is a simple syntax in which you list all the data you want in the matrix in the data array, and then specify where that data should go using row and col . Note that this will make the dimensions of the matrix large enough to hold the element in the largest row and column (in this case, the 6x8 matrix). You can see the matrix in standard form using the todense() method.

 A.todense() 

However, you cannot manipulate the matrix on the fly using this template. What you can do is change your own meager matrix representation. This includes 3 attributes, indices , indptr and data . First, consider the value of these attributes for an already created array.

 >>> print A.data array([6]) >>> print A.indices array([7], dtype=int32) >>> print A.indptr array([0, 0, 0, 0, 0, 0, 1], dtype=int32) 

data is the same as before, the 1-dimensional array of values ​​that we want in the matrix. The difference is that the position of this data is now determined by indices and indptr instead of row and col . indices is simple enough. This is just a list in which each data item is located. It will always have the same size and data array. indptr bit trickier. It allows the data structure to know which row each data record fits into. To quote from documents,

column indices for row i are stored in indices[indptr[i]:indptr[i+1]]

From this definition it is clear that the size of indptr will always be the number of rows in the matrix + 1. It takes a little time to get used to it, but working with the values ​​for each row will give you some intuition. Please note that all entries are zero to the last. This means that the column indices for rows i=0-4 will be stored in indices[0:0] , ie In an empty array. This is because these lines are all zeros. Finally, in the last row i=5 we get indices[0:1]=7 , which says that the input of data (s) data[0:1] is in row 5, column 7.

Now suppose we wanted to add the value 10 to column 2 of row 2. First we put it in the data attribute,

 A.data = np.array( [10,6] ) 

Next, we update indices to indicate that column 10 will be included.

 A.indices = np.array( [4,7], dtype=np.int32 ) 

and finally, indicate which line it will be in by changing indptr

 A.indptr = np.array( [0,0,0,1,1,1,2], dtype=np.int32 ) 

It is important that you create the data type indices and indptr np.int32 . One way to visualize what happens in indptr is to change the number when changing from i to i+1 rows containing data. Also note that such arrays can be used to build sparse matrices

 B = csr_matrix( (data,indices,indptr) ) 

It would be nice if it were as simple as indexing into an array, just as you tried, but the implementation does not yet exist. That should be enough for you to start at least.

+1
source

Source: https://habr.com/ru/post/972000/


All Articles