Divide the tensor into training and test sets

Say I read in a text file using TextLineReader. Is there a way to break this into training and test sets in Tensorflow? Sort of:

def read_my_file_format(filename_queue):
  reader = tf.TextLineReader()
  key, record_string = reader.read(filename_queue)
  raw_features, label = tf.decode_csv(record_string)
  features = some_processing(raw_features)
  features_train, labels_train, features_test, labels_test = tf.train_split(features,
                                                                            labels,
                                                                            frac=.1)
  return features_train, labels_train, features_test, labels_test
+9
source share
4 answers

Something like the following should work: tf.split_v(tf.random_shuffle(...

+6
source

As Elham said, you can use scikit-learn to make this easy. scikit-learn is an open source library for machine learning. There are many tools for preparing data, including a module that handles comparison, validation, and parameter selection. model_selection

model_selection.train_test_split() .

X_train, X_test, y_train, y_test = train_test_split(features,
                                                    labels,
                                                    test_size=0.33,
                                                    random_state=42)

test_size - , , random_state - .

. train_test_split , . .. (Train + Validation) Test, Train + Validation .

+9
import sklearn.model_selection as sk

X_train, X_test, y_train, y_test = 
sk.train_test_split(features,labels,test_size=0.33, random_state = 42)
+2
source

I managed to get a good result using the map and filter functions for tf.data.Dataset api. Just use the map function to randomly select examples between train and testing. To do this, for each example, you can get a sample from a uniform distribution and check if the sample value is below the speed division.

def split_train_test(parsed_features, train_rate):
    parsed_features['is_train'] = tf.gather(tf.random_uniform([1], maxval=100, dtype=tf.int32) < tf.cast(train_rate * 100, tf.int32), 0)
    return parsed_features

def grab_train_examples(parsed_features):
    return parsed_features['is_train']

def grab_test_examples(parsed_features):
    return ~parsed_features['is_train']
+2
source

Source: https://habr.com/ru/post/1667804/


All Articles