I have two questions regarding the selection marker with Apache lucene:
see this function
could you please explain the use of the stream token parameter.
I have several large lucene documents containing many fields, and each field has several lines. Now I have found the most relevant document for a specific request. Now this document was found because several words in the query could match the words in the document. I want to know what words in the query caused this. Therefore, for this I plan to use Lucene Hit Highlighter. Example: if the request is “skin doctor delhi”, and the document called “dermatologist” contains the words “skin” and “doctor”, then after highlighting the label, I should be able to separate the “skin” and “doctor” from the request. I have been trying to write code for this for several weeks. Unable to get what I want. could you help me?
Thanks in advance.
Update:
Current Approach: I am creating a query containing all the words in a document.
Field[] field = doc.getFields("description");
String desc = "";
for (int j = 0; j < field.length; ++j) {
desc += field[j].stringValue() + " ";
}
Query q = qp.parse(desc);
QueryScorer scorer = new QueryScorer(q, reader, "description");
Highlighter highlighter = new Highlighter(scorer);
String fragment = highlighter.getBestFragment(analyzer, "description", text);
It works for small documents, but does not work for large documents. It turns out the next stack.
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:152)
at org.apache.lucene.queryParser.QueryParser.getBooleanQuery(QueryParser.java:891)
at org.apache.lucene.queryParser.QueryParser.getBooleanQuery(QueryParser.java:866)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1213)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1167)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:182)
Obviously, this approach is not justified for large documents. What needs to be done to fix this?
BTW I am using FuzzyQuery matching.
source
share