What is the difference between sampled_softmax_loss and nce_loss in a tensor stream?

I notice that for calculating losses ( sampled_softmax_loss and nce_loss ) there are two negative selection functions in the tensor stream. the parameters of these two functions are similar, but I really want to know what is the difference between them?

+5
source share
2 answers

An example of softmax is choosing a sample of a given number and trying to get a loss of softmax. The main goal here is to make the result of the softmax sample equal to our true softmax. Thus, the algorithm mainly focuses on the selection of these samples from this distribution. On the other hand, NCE loss selects noise samples more and tries to imitate true softmax. It only takes one class class and noise class.

+2
source

The sampled softmax attempts to normalize all samples in your release. Having an abnormal distribution (logarithmic over your labels) is not an optimal loss function. Please note that although they have the same parameters, they are different how you use this function. Take a look at the documentation here: https://github.com/calebchoo/Tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard4/tf.nn.nce_loss.md and read this line:

By default, a log-unified (Zipfian) distribution is used to distribute the sample, so your tags must be sorted in decreasing order of frequency to achieve good results. See Log_uniform_candidate_sampler for more information.

Take a look at this article where they explain why they use it to insert words: http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf

Hope this helps!

+1
source

Source: https://habr.com/ru/post/1264835/


All Articles