Divide the tensor into training and test sets

Question

Divide the tensor into training and test sets

Say I read in a text file using TextLineReader. Is there a way to break this into training and test sets in Tensorflow? Sort of:

def read_my_file_format(filename_queue):
  reader = tf.TextLineReader()
  key, record_string = reader.read(filename_queue)
  raw_features, label = tf.decode_csv(record_string)
  features = some_processing(raw_features)
  features_train, labels_train, features_test, labels_test = tf.train_split(features,
                                                                            labels,
                                                                            frac=.1)
  return features_train, labels_train, features_test, labels_test

+9

tensorflow training-data cross-validation

Luke Jan 25 '17 at 19:05

source share

4 answers

As Elham said, you can use scikit-learn to make this easy. scikit-learn is an open source library for machine learning. There are many tools for preparing data, including a module that handles comparison, validation, and parameter selection. model_selection

model_selection.train_test_split() .

X_train, X_test, y_train, y_test = train_test_split(features,
                                                    labels,
                                                    test_size=0.33,
                                                    random_state=42)

test_size - , , random_state - .

. train_test_split , . .. (Train + Validation) Test, Train + Validation .

+9

Jspies 19 . '17 14:17

import sklearn.model_selection as sk

X_train, X_test, y_train, y_test = 
sk.train_test_split(features,labels,test_size=0.33, random_state = 42)

+2

elham shawky Apr 3 '17 at 12:44

source share

I managed to get a good result using the map and filter functions for tf.data.Dataset api. Just use the map function to randomly select examples between train and testing. To do this, for each example, you can get a sample from a uniform distribution and check if the sample value is below the speed division.

def split_train_test(parsed_features, train_rate):
    parsed_features['is_train'] = tf.gather(tf.random_uniform([1], maxval=100, dtype=tf.int32) < tf.cast(train_rate * 100, tf.int32), 0)
    return parsed_features

def grab_train_examples(parsed_features):
    return parsed_features['is_train']

def grab_test_examples(parsed_features):
    return ~parsed_features['is_train']

+2

Igor Gadelha Pereira Apr 13 '18 at 21:37

source share

user1454804 · Accepted Answer · 2017-01-25T22:06:42+0000

Something like the following should work: tf.split_v(tf.random_shuffle(...

Divide the tensor into training and test sets

More articles: