TensorFlow MLP does not train XOR

I created MLP with Google TensorFlow . The network works, but for some reason refuses to study properly. It always converges to an output of almost 1.0 no matter what the input actually is.

full code can be seen here .

Any ideas?


Input and output (batch size 4) is as follows:

input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input output_data = [[0.], [1.], [1.], [0.]] # XOR output n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input") n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output") 

Hidden Layer Configuration :

 # hidden layer bias neuron b_hidden = tf.Variable(0.1, name="hidden_bias") # hidden layer weight matrix initialized with a uniform distribution W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0), name="hidden_weights") # calc hidden layer activation hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden) 

Output Level Configuration :

 W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0), name="output_weights") # output layer weight matrix output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer activation 

My teaching methods are as follows:

 loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy optimizer = tf.train.GradientDescentOptimizer(0.01) # take a gradient descent for optimizing train = optimizer.minimize(loss) # let the optimizer train 

I tried both settings for cross entropy :

 cross_entropy = -tf.reduce_sum(n_output * tf.log(output)) 

and

 cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(n_output, output) 

where n_output is the original result, as described in output_data and output predicted / calculated value of my network.


Learning within the cycle (for n eras) is as follows:

 cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output], feed_dict={n_input: input_data, n_output: output_data}) 

I save the result to the values ​​for debug printing loss , W_hidden , ...


No matter what I tried, when I test my network, trying to verify the result, it always produces something like this:

 (...) step: 2000 loss: 0.0137040186673 b_hidden: 1.3272010088 W_hidden: [[ 0.23195425 0.53248233 -0.21644847 -0.54775208 0.52298909] [ 0.73933059 0.51440752 -0.08397482 -0.62724304 -0.53347367]] W_output: [[ 1.65939867] [ 0.78912479] [ 1.4831928 ] [ 1.28612828] [ 1.12486529]] (--- finished with 2000 epochs ---) (Test input for validation:) input: [0.0, 0.0] | output: [[ 0.99339396]] input: [0.0, 1.0] | output: [[ 0.99289012]] input: [1.0, 0.0] | output: [[ 0.99346077]] input: [1.0, 1.0] | output: [[ 0.99261558]] 

Thus, it does not study correctly, but always converges to almost 1.0 no matter what input is served.

+5
source share
2 answers

Meanwhile, with the help of a colleague, I was able to correct my decision and wanted to publish it for completeness. My solution works with cross-entropy and without changing training data . In addition, it has the desired input form (1, 2) and , which is scalar .

Uses AdamOptimizer , which reduces error much faster than a GradientDescentOptimizer . See this post for more information (& questions ^^) about the optimizer.

In fact, my network gives good results with only 400-800 training steps.

After 2000 training steps, the result is almost "perfect":

 step: 2000 loss: 0.00103311243281 input: [0.0, 0.0] | output: [[ 0.00019799]] input: [0.0, 1.0] | output: [[ 0.99979786]] input: [1.0, 0.0] | output: [[ 0.99996307]] input: [1.0, 1.0] | output: [[ 0.00033751]] 

 import tensorflow as tf ##################### # preparation stuff # ##################### # define input and output data input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input output_data = [[0.], [1.], [1.], [0.]] # XOR output # create a placeholder for the input # None indicates a variable batch size for the input # one input dimension is [1, 2] and output [1, 1] n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input") n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output") # number of neurons in the hidden layer hidden_nodes = 5 ################ # hidden layer # ################ # hidden layer bias neuron b_hidden = tf.Variable(tf.random_normal([hidden_nodes]), name="hidden_bias") # hidden layer weight matrix initialized with a uniform distribution W_hidden = tf.Variable(tf.random_normal([2, hidden_nodes]), name="hidden_weights") # calc hidden layer activation hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden) ################ # output layer # ################ W_output = tf.Variable(tf.random_normal([hidden_nodes, 1]), name="output_weights") # output layer weight matrix output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer activation ############ # learning # ############ cross_entropy = -(n_output * tf.log(output) + (1 - n_output) * tf.log(1 - output)) # cross_entropy = tf.square(n_output - output) # simpler, but also works loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy optimizer = tf.train.AdamOptimizer(0.01) # take a gradient descent for optimizing with a "stepsize" of 0.1 train = optimizer.minimize(loss) # let the optimizer train #################### # initialize graph # #################### init = tf.initialize_all_variables() sess = tf.Session() # create the session and therefore the graph sess.run(init) # initialize all variables ##################### # train the network # ##################### for epoch in xrange(0, 2001): # run the training operation cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output], feed_dict={n_input: input_data, n_output: output_data}) # print some debug stuff if epoch % 200 == 0: print("") print("step: {:>3}".format(epoch)) print("loss: {}".format(cvalues[1])) # print("b_hidden: {}".format(cvalues[3])) # print("W_hidden: {}".format(cvalues[2])) # print("W_output: {}".format(cvalues[4])) print("") print("input: {} | output: {}".format(input_data[0], sess.run(output, feed_dict={n_input: [input_data[0]]}))) print("input: {} | output: {}".format(input_data[1], sess.run(output, feed_dict={n_input: [input_data[1]]}))) print("input: {} | output: {}".format(input_data[2], sess.run(output, feed_dict={n_input: [input_data[2]]}))) print("input: {} | output: {}".format(input_data[3], sess.run(output, feed_dict={n_input: [input_data[3]]}))) 
+8
source

I can not comment because I do not have enough reputation, but I have some questions on this mrry answer. The loss function $ L_2 $ makes sense because it is basically an MSE function, but why doesn't cross-entropy work? Of course, it works for other NN libraries. Secondly, why in the world your input space from $ [0,1] β†’ [-1,1] $ will be displayed, especially if you added offset vectors.

EDIT This solution using cross-entropy and single-jet assembly from multiple sources. EDIT ^ 2 changed the code to use cross-entropy without any additional coding or bias of any strange target value

 import math import tensorflow as tf import numpy as np HIDDEN_NODES = 10 x = tf.placeholder(tf.float32, [None, 2]) W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES])) b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES])) hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden) W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 1])) b_logits = tf.Variable(tf.zeros([1])) logits = tf.add(tf.matmul(hidden, W_logits),b_logits) y = tf.nn.sigmoid(logits) y_input = tf.placeholder(tf.float32, [None, 1]) loss = -(y_input * tf.log(y) + (1 - y_input) * tf.log(1 - y)) train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss) init_op = tf.initialize_all_variables() sess = tf.Session() sess.run(init_op) xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) yTrain = np.array([[0], [1], [1], [0]]) for i in xrange(2000): _, loss_val,logitsval = sess.run([train_op, loss,logits], feed_dict={x: xTrain, y_input: yTrain}) if i % 10 == 0: print "Step:", i, "Current loss:", loss_val,"logits",logitsval print "---------" print sess.run(y,feed_dict={x: xTrain}) 
0
source

Source: https://habr.com/ru/post/1237099/


All Articles