Best Practices for Implementing Lucene Search in Java

Question

Best Practices for Implementing Lucene Search in Java

Every document in my Lucene index is like a post on stackoverflow, and I'm trying to search by index (which contains millions of documents). Each user should be able to search only by messages from users of the company. I have no control over how the data is indexed, and I only need to do a simple search (which works) on top of it.

Here is my first project:

String q = "mysql"
String companyId = "1001"

String[] fields = { "body", "subject", "number", "category", "tags"};

Float float10 = new Float(10);
Float float5 = new Float(5);

Map<String, Float> boost = new HashMap<String, Float>();
boost.put("body", float10);
boost.put("subject", float10);
boost.put("number", float5);
boost.put("category", float5);
boost.put("tags", float5);;

MultiFieldQueryParser mfqp = new MultiFieldQueryParser(fields, new StandardAnalyzer(), boost);
mfqp.setAllowLeadingWildcard(true); 
Query userQuery = mfqp.parse(q);

TermQuery companyQuery = new TermQuery(new Term("company_id", companyId));

BooleanQuery booleanQuery = new BooleanQuery();
BooleanQuery.setMaxClauseCount(50000)
booleanQuery.add(userQuery, BooleanClause.Occur.MUST);
booleanQuery.add(companyQuery, BooleanClause.Occur.MUST);

FSDirectory directory = FSDirectory.getDirectory(new File("/tmp/index"));
IndexSearcher searcher = SearcherManager.getIndexSearcherInstance(directory);
Hits hits = searcher.search(booleanQuery);

It basically works functionally, but I see some memory problems. I get an error from memory every 4, 5 days, and I took heapdump and saw that Lucene Term and TermInfo objects are exceeding the list. I am using a single instance of IndexSearcher and I see only one instance of this on the heap.

, ? ?

+3

java full-text-search lucene

Langali 10 . '09 20:45

3

? , ?

, OOME, . Lucene OR , . , . , "body: *", .

, , . , , , .

+1

bajafresh4life 15 . '09 13:26

? ?

MultiFieldQueryParser mfqp = new MultiFieldQueryParser(fields, new StandardAnalyzer(), boost);
mfqp.setAllowLeadingWildcard(true); 
Query userQuery = mfqp.parse(q);

Also do you use the code for the query in conjunction with the indexing process?

0

Joyce Dec 14 '09 at 16:02

source share

akuhn · Accepted Answer · 2009-12-11T10:24:49+0000

( , ). , visualvm. Memory Analyzer (MAT) eclipse ( , ). .

MAT, "Eclipse Memory Analyzer, 10 /" Markus Kohler, .

Best Practices for Implementing Lucene Search in Java

More articles: