The Simple Multilayer Perceptron Model Does Not Converge in TensorFlow

Question

The Simple Multilayer Perceptron Model Does Not Converge in TensorFlow

I am new to TensorFlow. Today I tried to implement my first model in TF, but it returned strange results. I know that something is missing here, but I could not understand it. Here is the story.

Model

I have a simple multi-layer Perceptron model with a single hidden layer used on the MNIST database. Layers are defined as [input (784), hidden_layer (470), output_layer (10)] with tanhas non-linearity for the hidden layer and softmaxas loss for the output layer. The optimizer I use is Gradient Descent . Learning speed algorithm 0.01. My mini-batch size is 1 (I train the model with samples one by one).

My implementations:

First, I implemented my model in C ++ and got about 96% accuracy. Here is the repository: https://github.com/amin2ros/Artificog
I implemented the exact model in TensorFlow, but surprisingly the model did not converge at all. Here is the code.

code:

import sys
import input_data
import matplotlib.pyplot as plt
from pylab import *
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.1
training_epochs = 1
batch_size = 1
display_step = 1
# Network Parameters
n_hidden_1 = 470 # 1st layer num features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
# Create model
def multilayer_perceptron(_X, _weights, _biases):
    layer_1 = tf.tanh(tf.add(tf.matmul(_X, _weights['h1']), _biases['b1'])) 
    return tf.matmul(layer_1, _weights['out']) + _biases['out']
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax(pred)) # Softmax loss
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost) #
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        m= 0 
        total_batch = int(mnist.train.num_examples/batch_size)
        counter=0
        #print 'count = ' , total_batch
        #sys.stdin.read(1)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            label = tf.argmax(batch_ys,1).eval()[0] 
            counter+=1
            sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys})
            wrong_prediction = tf.not_equal(tf.argmax(pred, 1), tf.argmax(y, 1))
            missed=tf.cast(wrong_prediction, "float")
            m += missed.eval({x: batch_xs, y: batch_ys})[0]
            print "Sample #", counter , " - Label : " , label , " - Prediction :" , tf.argmax(pred, 1).eval({x: batch_xs, y: batch_ys})[0]  ,\
             "- Missed = " , m ,  " - Error Rate = " , 100 * float(m)/counter
    print "Optimization Finished!"

I am very curious why this is happening. Any help is appreciated.

Edit:

As indicated below, the definition of the cost function was incorrect, so it should be like

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))

Now the model is converging :)

+4

deep-learning tensorflow

Esildor Jan 12 '16 at 20:15

source share

No one has answered this question yet.

See related questions:

501

Tensorflow: how to save / restore a model?

2

keras - per pixel, unnormalized, loss of softmax for semantic segmentation

2

TensorFlow won't recognize when input = output (or probably I'm missing something)

2

Tensorflow model has different results than the same model in skflow (optimizer)

2

A simple feeder neural network with TensorFlow will not study

1

Calculate / visualize Tensorflow Keras Dense ratio of the relative weight of the compound to the output classes.

1

Tensorflow - output of the predicted value in the multilayer perceptron regression

1

Using make_template () in TensorFlow

0

The tensor flow model is populated by NaNs during training

0

Neural Network - ValueError: Unable to pass form value

The Simple Multilayer Perceptron Model Does Not Converge in TensorFlow

Model

code:

Edit:

More articles: