Best Practices for Implementing Lucene Search in Java

Every document in my Lucene index is like a post on stackoverflow, and I'm trying to search by index (which contains millions of documents). Each user should be able to search only by messages from users of the company. I have no control over how the data is indexed, and I only need to do a simple search (which works) on top of it.

Here is my first project:

String q = "mysql"
String companyId = "1001"

String[] fields = { "body", "subject", "number", "category", "tags"};

Float float10 = new Float(10);
Float float5 = new Float(5);

Map<String, Float> boost = new HashMap<String, Float>();
boost.put("body", float10);
boost.put("subject", float10);
boost.put("number", float5);
boost.put("category", float5);
boost.put("tags", float5);;

MultiFieldQueryParser mfqp = new MultiFieldQueryParser(fields, new StandardAnalyzer(), boost);
mfqp.setAllowLeadingWildcard(true); 
Query userQuery = mfqp.parse(q);

TermQuery companyQuery = new TermQuery(new Term("company_id", companyId));

BooleanQuery booleanQuery = new BooleanQuery();
BooleanQuery.setMaxClauseCount(50000)
booleanQuery.add(userQuery, BooleanClause.Occur.MUST);
booleanQuery.add(companyQuery, BooleanClause.Occur.MUST);

FSDirectory directory = FSDirectory.getDirectory(new File("/tmp/index"));
IndexSearcher searcher = SearcherManager.getIndexSearcherInstance(directory);
Hits hits = searcher.search(booleanQuery);

It basically works functionally, but I see some memory problems. I get an error from memory every 4, 5 days, and I took heapdump and saw that Lucene Term and TermInfo objects are exceeding the list. I am using a single instance of IndexSearcher and I see only one instance of this on the heap.

, ? ?

+3
3

( , ). , visualvm. Memory Analyzer (MAT) eclipse ( , ). .

MAT, "Eclipse Memory Analyzer, 10 /" Markus Kohler, .

+1

? , ?

, OOME, . Lucene OR , . , . , "body: *", .

, , . , , , .

+1

? ?

MultiFieldQueryParser mfqp = new MultiFieldQueryParser(fields, new StandardAnalyzer(), boost);
mfqp.setAllowLeadingWildcard(true); 
Query userQuery = mfqp.parse(q);

Also do you use the code for the query in conjunction with the indexing process?

0
source

Source: https://habr.com/ru/post/1725326/


All Articles