How to search for Lucene.NET without specifying the "top n" limit?

Lucene has several overloads of the IndexSearcher.Search method. Some of them require "top n hits" arguments, some do not (they are deprecated and will be removed in Lucene.NET 3.0).

Those that require the argument "top n" actually preallocate memory for all of this possible range of results. Therefore, when you are in a situation where you cannot even get close to estimating the number of returned results, the only way is to pass a random large amount to ensure that all query results are returned. This causes severe memory pressure and leakage due to LOH fragmentation.

Is there an official, but not outdated way to search without passing the argument "top n"?

Thanks guys.

+3
source share
1 answer

I am using Lucene.NET 2.9.2 as a checkpoint for this answer.

You can create a custom collector that will jump to one of the search overloads.

using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Search;

public class AwesomeCollector : Collector {
    private readonly List<Int32> _docIds = new List<Int32>();
    private Scorer _scorer;
    private Int32 _docBase;

    public IEnumerable<Int32> DocumentIds {
        get { return _docIds; }
    }

    public override void SetScorer(Scorer scorer) {
        _scorer = scorer;
    }

    public override void Collect(Int32 doc) {
        var score = _scorer.Score();
        if (_lowerInclusiveScore <= score)
            _docIds.Add(_docBase + doc);
    }

    public override void SetNextReader(IndexReader reader, Int32 docBase) {
        _docBase = docBase;
    }

    public override bool AcceptsDocsOutOfOrder() {
        return true;
    }
}
+2
source

Source: https://habr.com/ru/post/1786877/


All Articles