Existing works use an automatic encoder to create models at the proposal level. Basically, after training a model using Autoencode, you can get a vector for a sentence. Since any document consists of sentences, you can get a set of vectors for the document and classify the documents. In my experience with a different vector representation (for example, generated from autoencodes) this may give worse answers than classification with a bag of words.
source
share