I saw a description of the disappearance in different parts of the neural network:
loss in the weight matrix,
loss in a hidden layer after matrix multiplication and before relu,
loss in a hidden layer after relu,
and dropout in output to softmax function
I'm a little confused about where I have to drop out. Can anyone help to figure this out? Thanks!
source share