I would like to use Tensorflow to mark a sequence, namely for part of speech tags. I tried to use the same model described here: http://tensorflow.org/tutorials/seq2seq/index.md (which describes a model for translating English into French).
Since the input and output sequences are the same length when marking, I configured the buckets so that the input and output sequences are the same length and tried to learn the marker tag using this model on ConLL 2000.
However, it seems that the decoder sometimes outputs taggedsequence shorter than the input sequence (it seems that it seems that the EOS tag appears prematurely)
For example:
He believes that the current account deficit will be reduced to 1.8 billion in September.
In the above sentence, 18 tokens are indicated, which are filled up to 20 (due to balancing).
When decoding the request above, the decoder spits out the following:
PRP VBD DT JJ JJ NN MD VB TO VB DT NN IN NN. _EOS. Cd cd
So here he ends the sequence (EOS) after 15 tokens, not 18.
How can I make a sequence know that the decoded sequence should be the same length as the encoded one in my script.
source
share