I have a very simple binary classification dataset in a csv file that looks like this:
"feature1","feature2","label"
1,0,1
0,1,0
...
where the column "label"
indicates the class (1 is positive, 0 is negative). The number of functions is actually quite large, but it does not matter for this question.
Here is how I read the data:
train = pandas.read_csv(TRAINING_FILE)
y_train, X_train = train['label'], train[['feature1', 'feature2']].fillna(0)
test = pandas.read_csv(TEST_FILE)
y_test, X_test = test['label'], test[['feature1', 'feature2']].fillna(0)
I want to run tensorflow.contrib.learn.LinearClassifier
, and tensorflow.contrib.learn.DNNClassifier
from these data. For example, I initialize DNN as follows:
classifier = DNNClassifier(hidden_units=[3, 5, 3],
n_classes=2,
feature_columns=feature_columns,
activation_fn=nn.relu,
enable_centered_bias=False,
model_dir=MODEL_DIR_DNN)
So, how exactly should I create feature_columns
when all functions are binary as well (0 or 1 are the only possible values)?
Here is a learning model:
classifier.fit(X_train.values,
y_train.values,
batch_size=dnn_batch_size,
steps=dnn_steps)
A solution with replacing parameters fit()
with an input function will also be great.
Thank!
PS I am using TensorFlow version 1.0.1