An example of softmax is choosing a sample of a given number and trying to get a loss of softmax. The main goal here is to make the result of the softmax sample equal to our true softmax. Thus, the algorithm mainly focuses on the selection of these samples from this distribution. On the other hand, NCE loss selects noise samples more and tries to imitate true softmax. It only takes one class class and noise class.
source share