I am new to TensorFlow. Today I tried to implement my first model in TF, but it returned strange results. I know that something is missing here, but I could not understand it. Here is the story.
Model
I have a simple multi-layer Perceptron model with a single hidden layer used on the MNIST database. Layers are defined as [input (784), hidden_layer (470), output_layer (10)] with tanhas non-linearity for the hidden layer and softmaxas loss for the output layer. The optimizer I use is Gradient Descent . Learning speed algorithm 0.01. My mini-batch size is 1 (I train the model with samples one by one).
My implementations:
- First, I implemented my model in C ++ and got about 96% accuracy. Here is the repository: https://github.com/amin2ros/Artificog
- I implemented the exact model in TensorFlow, but surprisingly the model did not converge at all. Here is the code.
code:
import sys
import input_data
import matplotlib.pyplot as plt
from pylab import *
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
learning_rate = 0.1
training_epochs = 1
batch_size = 1
display_step = 1
n_hidden_1 = 470
n_input = 784
n_classes = 10
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
def multilayer_perceptron(_X, _weights, _biases):
layer_1 = tf.tanh(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1']))
return tf.matmul(layer_1, _weights['out']) + _biases['out']
weights = {
'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
pred = multilayer_perceptron(x, weights, biases)
cost = tf.reduce_mean(tf.nn.softmax(pred))
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for epoch in range(training_epochs):
avg_cost = 0.
m= 0
total_batch = int(mnist.train.num_examples/batch_size)
counter=0
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
label = tf.argmax(batch_ys,1).eval()[0]
counter+=1
sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
wrong_prediction = tf.not_equal(tf.argmax(pred, 1), tf.argmax(y, 1))
missed=tf.cast(wrong_prediction, "float")
m += missed.eval({x: batch_xs, y: batch_ys})[0]
print "Sample #", counter , " - Label : " , label , " - Prediction :" , tf.argmax(pred, 1).eval({x: batch_xs, y: batch_ys})[0] ,\
"- Missed = " , m , " - Error Rate = " , 100 * float(m)/counter
print "Optimization Finished!"
I am very curious why this is happening. Any help is appreciated.
Edit:
As indicated below, the definition of the cost function was incorrect, so it should be like
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
Now the model is converging :)