TL DR: For this you should use
loss = tf.nn.l2_loss(logits - y_input)
... instead of tf.nn.softmax_cross_entropy_with_logits .
The tf.nn.softmax_cross_entropy_with_logits expects the logic and label inputs to be a batch_size matrix of batch_size size. Each line of logits is an unexpanded probability distribution by class; and each label string is a pre-warmed encoding of the true class for each example in the package. If the inputs do not match these assumptions, the learning process may diverge.
In this code, the batch_size batch_size 1, which means that there is only one class, and softmax outputs a class 0 prediction for all examples; labels are not hot. If you look at the implementation of the statement , the backprop value for tf.nn.softmax_cross_entropy_with_logits :
// backprop: prob - labels, where // prob = exp(logits - max_logits) / sum(exp(logits - max_logits))
It will be [[1], [1], [1], [1]] - [[0], [1], [1], [0]] at every step that clearly does not converge.
source share