I am following this pytorch tutorial http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html and trying to apply this principle with generalization. If the coding sequence will be about 1000 words, and the decoder - 200 words. How to apply seq2seq to this. I know that it would be very expensive and almost impossible to run the entire sequence of 1000 words at once. So splitting seq into say 20 seq and running in parallel might be the answer. But I'm not sure how to implement this. I also want to include attention in it.
source share