AttentionDecoderRNN without MAX_LENGTH

From the PyTorch Seq2Seq tutorial, http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#attention-decoder

We see that the attention mechanism is highly dependent on the parameter MAX_LENGTHfor determining the output sizes attn -> attn_softmax -> attn_weights, i.e.

class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

More specific

self.attn = nn.Linear(self.hidden_size * 2, self.max_length)

I understand that a variable MAX_LENGTHis a mechanism to reduce the number. parameters to be trained in AttentionDecoderRNN.

If we do not have a predefined MAX_LENGTH. What values ​​should we initialize the layer attnwith <

output_size? , . "" (2015 .)?

+4
1

. , ​​ , . , MAX_LENGTH .

+5

Source: https://habr.com/ru/post/1693391/


All Articles