How to remove logically deleted documents from the Solr index?

I use Solr for free text search for a project where searchable entries will need to be added and deleted on a large scale every day.

Due to the scale, I need to make sure the size of the index fits.

In my Solr test setup, I index a set of 10 documents. Then I make changes to one of the documents and want to replace the document with the same identifier in the index. This works correctly and behaves as expected during the search.

I use this code to update a document:

getSolrServer().deleteById(document.getIndexId());
getSolrServer().add(document.getSolrInputDocument());
getSolrServer().commit();

I noticed that when I look at the statistics page for the Solr server, the numbers do not match what I expect.

After the initial index, numDocs and maxDocs are 10 as expected. However, when I update the document, numDocs is still 10 (expected), but maxDocs is 11 (unexpectedly).

When reading the documentation, I see that

maxDoc may be larger because the number of maxDoc includes logically deleted documents that are not yet removed from the index.

So the question is, how to remove logically deleted documents from the index?

If these documents still exist in the index, am I at risk of performance fines if this is done with a very large volume of documents?

Thank:)

+3
source share

Source: https://habr.com/ru/post/1750295/


All Articles