TensorFlow XOR code works fine with a two-dimensional purpose, but not without?

Trying to implement a very simple XOR FFNN in TensorFlow. I may just not understand the code, but can anyone see the obvious reason why this will not work - explodes to NaN and starts with the loss of $ 0 $. Switches to work / does not work if you want to chat with him. Thanks!

import math import tensorflow as tf import numpy as np HIDDEN_NODES = 10 x = tf.placeholder(tf.float32, [None, 2]) W_hidden = tf.Variable(tf.truncated_normal([2, HIDDEN_NODES])) b_hidden = tf.Variable(tf.zeros([HIDDEN_NODES])) hidden = tf.nn.relu(tf.matmul(x, W_hidden) + b_hidden) #----------------- #DOESN"T WORK W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 1])) b_logits = tf.Variable(tf.zeros([1])) logits = tf.add(tf.matmul(hidden, W_logits),b_logits) #WORKS # W_logits = tf.Variable(tf.truncated_normal([HIDDEN_NODES, 2])) # b_logits = tf.Variable(tf.zeros([2])) # logits = tf.add(tf.matmul(hidden, W_logits),b_logits) #----------------- y = tf.nn.softmax(logits) #----------------- #DOESN"T WORK y_input = tf.placeholder(tf.float32, [None, 1]) #WORKS #y_input = tf.placeholder(tf.float32, [None, 2]) #----------------- cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, y_input) loss = tf.reduce_mean(cross_entropy) loss = cross_entropy train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss) init_op = tf.initialize_all_variables() sess = tf.Session() sess.run(init_op) xTrain = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) #----------------- #DOESN"T WORK yTrain = np.array([[0], [1], [1], [0]]) # WORKS #yTrain = np.array([[1, 0], [0, 1], [0, 1], [1, 0]]) #----------------- for i in xrange(500): _, loss_val,logitsval = sess.run([train_op, loss,logits], feed_dict={x: xTrain, y_input: yTrain}) if i % 10 == 0: print "Step:", i, "Current loss:", loss_val,"logits",logitsval print sess.run(y,feed_dict={x: xTrain}) 
+1
source share
1 answer

TL DR: For this you should use

 loss = tf.nn.l2_loss(logits - y_input) 

... instead of tf.nn.softmax_cross_entropy_with_logits .

The tf.nn.softmax_cross_entropy_with_logits expects the logic and label inputs to be a batch_size matrix of batch_size size. Each line of logits is an unexpanded probability distribution by class; and each label string is a pre-warmed encoding of the true class for each example in the package. If the inputs do not match these assumptions, the learning process may diverge.

In this code, the batch_size batch_size 1, which means that there is only one class, and softmax outputs a class 0 prediction for all examples; labels are not hot. If you look at the implementation of the statement , the backprop value for tf.nn.softmax_cross_entropy_with_logits :

 // backprop: prob - labels, where // prob = exp(logits - max_logits) / sum(exp(logits - max_logits)) 

It will be [[1], [1], [1], [1]] - [[0], [1], [1], [0]] at every step that clearly does not converge.

+3
source

Source: https://habr.com/ru/post/1237100/


All Articles