I was looking through the Caffe code for the SigmoidCrossEntropyLoss layer and documents , and I'm a bit confused. Documents list the loss function as log loss (I would reproduce it here, but without Latex, the formula will be difficult to read. Check the document link, it is at the very top).
However, the code itself ( Forward_cpu(...)
) shows a different formula
Dtype loss = 0; for (int i = 0; i < count; ++i) { loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) - log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0))); } top[0]->mutable_cpu_data()[0] = loss / num;
Is it because it means that the sigmoid function is already applied to the input?
However, even in this case, the fragments (input_data[i] >= 0)
also confuse me. They seem to be in place of p_hat from the document loss formula, which is supposed to be a prediction suppressed by a sigmoid function. So why do they just take the binary threshold? This made it even more confusing since this loss predicts [0,1] outputs, so (input_data[i] >= 0)
will be 1
if it is not 100% sure that it is not.
Can someone explain this to me?
source share