Understanding Tensor Stream Sequence Sequence Parameters

The sequence_Loss source_code module contains three parameters that are required for them to display them as outputs, goals, and weights.

The conclusions and goals are explanatory, but I better understand what the weight parameter is.

Another thing that I am confused about is that he claims that targets should be the same length as the outputs, what exactly do they mean by the length of the tensor? Especially if it is a three-dimensional tensor.

+5
source share
2 answers

We used this in the classroom, and our professor said that we can just pass it the correct shape (the comment says: "The list of 1D dimensional float tensor-sizators is the same length as the logits"). This does not help in what they mean, but perhaps it will help you run your code. Worked for me.

This code should do the trick: [tf.ones(batch_size, tf.float32) for _ in logits] .

Edit: from TF code:

 for logit, target, weight in zip(logits, targets, weights): if softmax_loss_function is None: # TODO(irving,ebrevdo): This reshape is needed because # sequence_loss_by_example is called with scalars sometimes, which # violates our general scalar strictness policy. target = array_ops.reshape(target, [-1]) crossent = nn_ops.sparse_softmax_cross_entropy_with_logits( logit, target) else: crossent = softmax_loss_function(logit, target) log_perp_list.append(crossent * weight) 

Missed weights are multiplied by the loss for this particular logit. Therefore, I assume that if you want to take a specific forecast seriously, you can increase your weight above 1.

0
source

Think of scales as a mask applied to the input tensor. In some NLP applications, we often use the sentence length for each sentence. For parallel / batch input of several instances into the mini-channel for feeding into the neural network, people use a matrix mask to indicate which element in the input tensor is actually a valid input. For example, the weight could be np.ones([batch, max_length]) , which means that all input elements are legal.

We can also use a matrix of the same shape as labels, such as np.asarray([[1,1,1,0],[1,1,0,0],[1,1,1,1]]) (we assume that the shape of the labels is 3x4), then crossEntropy of the first column of the last row will be masked as 0.

You can also use weight to calculate weighted cross-entropy accumulation.

+1
source

Source: https://habr.com/ru/post/1261239/


All Articles