Encoding scipy sparse matrix as TFRecords

Question

Encoding scipy sparse matrix as TFRecords

I am creating a tensor flow model where the input is a large scipy sparse matrix, each row is a sample of dimension> 50k, where only a few hundred values are nonzero.

Currently, I store this matrix as a brine, then completely load it into memory, do batch processing and convert the samples in a batch to an array with a dense matrix, which I load into a model. It works fine as soon as all the data can fit into memory, but this method cannot be processed when I want to use more data.

I explored TFRecords as a way to serialize my data and read it more efficiently with a tensor stream, but I can't find any examples with sparse data.

I found an example for mnist:

writer = tf.python_io.TFRecordWriter("mnist.tfrecords") # construct the Example protob oject example = tf.train.Example( # Example contains a Features proto object features=tf.train.Features( # Features contains a map of string to Feature proto objects feature={ # A Feature contains one of either a int64_list, # float_list, or bytes_list 'label': tf.train.Feature( int64_list=tf.train.Int64List(value=[label])), 'image': tf.train.Feature( int64_list=tf.train.Int64List(value=features.astype("int64"))), })) # use the proto object to serialize the example to a string serialized = example.SerializeToString() # write the serialized object to disk writer.write(serialized)

where label is int , and functions are np.array length 784, representing each pixel in the image as a float. I understand this approach, but I cannot reproduce it, since converting each row of my sparse matrix into a dense np.array also be impractical.

It seems to me that I need to create a key for each function (column) and specify only for non-zero values for each example, but I'm not sure that you can specify the default value (0 in my case) for "missing".

What would be the best way to do this?

+6

scipy tensorflow

Thomas reynaud Oct 20 '16 at 12:45

source share

No one has answered this question yet.

See related questions:

33

Creating dense matrix from sparse matrix in numpy python

ten

Saving and reading a list of size variables from TFRecord

6