Scipy Sparse CSR Matrix for TensorFlow SparseTensor - Mini Batch Gradient Descent

I have a Scipy sparse CSR matrix created from a sparse matrix of TF-IDF functions in SVM-Light format. The number of functions is huge and few allowed, so I have to use SparseTensor, otherwise it is too slow.

For example, the number of functions is 5, and an example file might look like this:

0 4:1 1 1:3 3:4 0 5:1 0 2:1 

After analysis, the training set is as follows:

 trainX = <scipy CSR matrix> trainY = np.array( [0,1,00] ) 

I have two important questions:

1) How can I convert this to SparseTensor (sp_ids, sp_weights) efficiently to do fast multiplication (WX) using search: https://www.tensorflow.org/versions/master/api_docs/python/nn.html#embedding_lookup_sparse

2) How to randomize the data set in each era and recalculate sp_ids, sp_weights so that I can feed (feed_dict) for a mini-batch gradient descent.

Sample code on a simple model, such as logistic regression, would be greatly appreciated. The schedule will be like this:

 # GRAPH mul = tf.nn.embedding_lookup_sparse(W, X_sp_ids, X_sp_weights, combiner = "sum") # WX z = tf.add(mul, b) # WX + b cost_op = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(z, y_true)) # this already has built in sigmoid apply train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost_op) # construct optimizer predict_op = tf.nn.sigmoid(z) # sig(WX + b) 
+6
source share
1 answer

I can answer the first part of your question.

 def convert_sparse_matrix_to_sparse_tensor(X): coo = X.tocoo() indices = np.mat([coo.row, coo.col]).transpose() return tf.SparseTensor(indices, coo.data, coo.shape) 

First, you convert the matrix to COO format. Then you extract indexes, values, and form and pass them directly to the SparseTensor constructor.

+13
source

Source: https://habr.com/ru/post/1012770/


All Articles