When you set term_vector=with_positions_offsets
for a specific field, it means that you keep the term "vectors per document" for that field.
When it comes to highlighting, vector vectors let you use a pronounced lucene vector marker, which is faster than a standard marker. The reason is that the standard marker does not have a quick way to highlight, because the index does not contain enough information (position and offset). He can only re-analyze the contents of the field, intercept offsets and positions, and do highlighting based on this information. This can take quite some time, especially with long text fields.
Using terminal vectors, you have enough information and do not need to re-analyze the text. The disadvantage is the index, which will increase markedly. I should add that since the vectors of the Lucene 4.2 vector are better compressed and stored in an optimized way. And also the new PostingsHighlighter, based on the ability to store offsets in the posting list, which requires even less space.
elasticsearch automatically uses the best way to make selection based on available information. If vector vectors are saved, it will use a fast vector marker, otherwise standard. After reindex without vectors of vectors, the selection will be performed using a standard marker. It will be slower, but the index will be smaller.
Regarding ngram fields, the described behavior is strange, since a fast vector marker should have better support for ngram fields, so I expect exactly the opposite result.
source share