I want to find the top 1000 documents in the Lucene.NET index that match the type and tag, sorted by number of views. My index contains 17 million documents. I am looking for type 'entity' and tag 'business'. Almost every document in the index right now has an entity type and a business tag. Both are string fields that are indexed but not parsed and do not have a condition vector. Right now, it takes about 15-20 seconds to get the results of this query.
Here is my code:
string subType = "entity"; string tag = "business"; BooleanQuery filterQuery = new BooleanQuery(); filterQuery.Add(new BooleanClause(new TermQuery(new Term("SubType", subType)), BooleanClause.Occur.MUST)); filterQuery.Add(new BooleanClause(new TermQuery(new Term("Tag", tag)), BooleanClause.Occur.MUST)); Sort sort = new Sort(new SortField("Views", global::Lucene.Net.Search.SortField.INT, true)); Filter queryFilter = new QueryWrapperFilter(filterQuery); TopDocs docs = searcher.Search(new MatchAllDocsQuery(), queryFilter, 1000, sort);
Any suggestions for improving productivity are welcome. I spent about 8 hours setting up and communicating with things. Right now I am caching the results for 15 minutes so that future searches can simply return the caching results, but this initial search is just painfully slow.
It seems that sets for the terms "entity" and "business" should be compressed to several bytes, assuming that Lucene does any encoding in length ...
source share