Lucene - Effective Text Search

I have an index generated by the apache pdfbox class LucenePDFDocument. Since the index contains only textual content, I want to efficiently look for this index.

I will search in the “content” field for a search string, the order of the result should be the most relevant for the less relevant. In the code below, files were displayed that contain the words of the searched text, for example, “What nationality”, but the results did not contain a file containing this complete sentence.

Which query analyzer and query should be used to search in the above scenario.

      Query query = new MultiFieldQueryParser(Version.LUCENE_30, fields,
                new StandardAnalyzer(Version.LUCENE_30))
                .parse(searchString);

      TopScoreDocCollector collector = TopScoreDocCollector.create(5,
                false);
        searcher.search(query, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
        System.out.println("count " + hits.length);
        for (ScoreDoc scoreDoc : hits) {
            int docId = scoreDoc.doc;
            Document d = searcher.doc(docId);
            System.out.println(d.getField("path"));
        }
+3
source share
1 answer

, quesry Lucene. , , ..

What is your nationality

"What is your nationality"

, "", "", "" "" ( "" "" ) , . , 5 TopScoreDocCollector, . Lucene .

Also, if you are only looking in the "content" field, you do not need to MultiFieldQueryParseruse it instead QueryParser.

+1
source

Source: https://habr.com/ru/post/1785893/


All Articles