The effect of index size on search speed (to store or not to store)

Right now, we are using Solr as a full-text index, where all document fields are indexed but not saved. There are several million documents with an index size of 50 GB. The average request time is about 100 ms.

To use features such as Backlight, we think: extra text in the repository. But this can double the size of the index files.

I know that there is absolutely no (linear) relationship between index size and query time. An increase in documents by a factor of 10 practically does not affect the time of the request.

But in general, a system (Solr / Lucene / Linux / ...) should process more information - index files (for example) are based on much more I-nodes, etc.

So, I am sure that the effect on query time affects the size of the index. (But: is it noticeable?)

first: do you think I'm right? Did you have any impressions of the size of the index and the speed of the search with or without saved text? Is it prudent and prudent to blow up an index while keeping documents?

second: Do you know how Solr / Lucene handled saved text? Maybe in separate files? (That there was no effect for search in simple situations where the saved text is not required !?)

Thanks.

+4
source share
1 answer

Yes, it’s absolutely true that the index grows if you make large fields saved, but if you want to select them, you have no other way. I don’t think that the speed will be significantly reduced, maybe simply because you need to download more data that receives the results, but this is not so important.

Regarding the format of the lucene index and different files in the index, you can see here : the saved fields are stored in a specific file.

+1
source

Source: https://habr.com/ru/post/1390443/


All Articles