I have a Scipy sparse CSR matrix created from a sparse matrix of TF-IDF functions in SVM-Light format. The number of functions is huge and few allowed, so I have to use SparseTensor, otherwise it is too slow.
For example, the number of functions is 5, and an example file might look like this:
0 4:1 1 1:3 3:4 0 5:1 0 2:1
After analysis, the training set is as follows:
trainX = <scipy CSR matrix> trainY = np.array( [0,1,00] )
I have two important questions:
1) How can I convert this to SparseTensor (sp_ids, sp_weights) efficiently to do fast multiplication (WX) using search: https://www.tensorflow.org/versions/master/api_docs/python/nn.html#embedding_lookup_sparse
2) How to randomize the data set in each era and recalculate sp_ids, sp_weights so that I can feed (feed_dict) for a mini-batch gradient descent.
Sample code on a simple model, such as logistic regression, would be greatly appreciated. The schedule will be like this:
# GRAPH mul = tf.nn.embedding_lookup_sparse(W, X_sp_ids, X_sp_weights, combiner = "sum")