Over the past few weeks, I have been working on updating the application from Lucene 3.x to Lucene 4.x in the hope of improving performance. Unfortunately, after going through the complete migration and game process with all the tricks I found on the Internet and in the documentation, Lucene 4 works much slower than Lucene 3 (~ 50%). At the moment, I hardly understand ideas, and I was wondering if anyone had any suggestions on how to bring it up to speed. I'm not even looking for a better improvement over 3.x; I would be happy to just pick it up and stay in the current release of Lucene.
<& Edit ET;
To confirm that none of the standard migration changes had a negative impact on performance, I put my version of Lucene 4.x in Lucene 3.6.2 and saved the new API, rather than using the custom ParallelMultiSearcher and other deprecated methods / classes.
Performance in 3.6.2 is even faster than before:
- Old application (Lucene 3.6.0) - ~ 5700 requests / min
- Updated application with a new API and some minor optimizations (Lucene 4.4.0) - ~ 2900 requests / min
- The new version of the application is moved back, but the optimization and the new interface IndexSearcher / etc (Lucene 3.6.2) are preserved - ~ 6200 requests / min
Since optimizing and using the new Lucene API actually improved performance by 3.6.2, it makes no sense to be a problem with anything other than Lucene. I just don’t know what else I can change in my program to fix this.
</ & Edit ET;
Application info
We have one index, divided into 20 shards - this provided better performance in both Lucene 3.x and Lucene 4.x
Currently, the index contains ~ 150 million documents, all of which are quite simple and largely normalized, so there are many duplicate tokens. Only one field (identifier) is saved - the rest cannot be restored.
We have a fixed set of relatively simple queries that are populated with user input and execution - they consist of several BooleanQueries, TermQueries and TermRangeQueries. Some of them are now nested, but only one level.
We do nothing with the results - we just get the ratings and stored identifier fields
We use MMapDirectories pointing to index files in tmpfs. We played “use hack” with useUnmap, since we don’t open new directories very often and got a good boost from this
We use one IndexSearcher for all queries.
Our test machines have 94 GB of RAM and 64 logical cores.
General processing
1) Request received while listening to sockets
2) Up to 4 request objects are generated and filled with normalized user input (all necessary input data for the request must be present or not executed)
3) Requests are executed in parallel using the Fork / Join infrastructure
- Subqueries for each shard are executed in parallel using IndexSearcher w / ExecutorService
4) Aggregation and simple post-processing
Other relevant information
Indexes were recreated for the 4.x system, but the data is the same. We tried the regular Lucene42 codec, as well as the extended one, which did not use compression (on an offer on the Internet).
In 3.x we used a modified version of ParallelMultisearcher, in 4.x we use IndexSearcher with ExecutorService and we unite all our readers in MultiReader
In 3.x, we used ThreadPoolExecutor instead of Fork / Join (Fork / Join performed better in my tests)
4.x Hotspots
Method | Independent time (%) | Self time (ms) | Battery Life (CPU in ms)
java.util.concurrent.CountDownLatch.await () | 11.29% | 140887,219 | 0.0 <- this is only from tcp threads waiting for the completion of real work - you can ignore it
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader $ BlockDocsEnum. <init> () | 9.74% | 21594.03 | 121594
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader $ SegmentTermsEnum $ Frame. <init> () | 9.59% | 119680.956 | 119680
org.apache.lucene.codecs.lucene41.ForUtil.readBlock () | 6.91% | 86208.621 | 86208
org.apache.lucene.search.DisjunctionScorer.heapAdjust () | 6.68% | 83332,525 | 83332
java.util.concurrent.ExecutorCompletionService.take () | 5.29% | 66081,499 | 6153
org.apache.lucene.search.DisjunctionSucorer.afterNext () | 4.93% | 61560,872 | 61560
org.apache.lucene.search.Tercorer.advance () | 4.53% | 56530,752 | 56530
java.nio.DirectByteBuffer.get () | 3.96% | 49470.349 | 49470
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader $ SegmentTerEnum. <init> () | 2.97% | 37051,644 | 37051
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader $ SegmentTerEnum.getFrame () | 2.77% | 34576.54 | 34576
org.apache.lucene.codecs.MultiLevelSkipListReader.skipTo () | 2.47% | 30767.711 | 30767
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.newTertate () | 2.23% | 27782,522 | 27782
java.net.ServerSocket.accept () | 2.19% | 27380,696 | 0.0
org.apache.lucene.search.DisjunctionSucorer.advance () | 1.82% | 22775.325 | 22775
org.apache.lucene.search.HitQueue.getSentinelObject () | 1.59% | 19869.871 | 19869
org.apache.lucene.store.ByteBufferIndexInput.buildSlice () | 1.43% | 17861.148 | 17861
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader $ SegmentTerEnum.getArc () | 1.35% | 16813.927 | 16813
org.apache.lucene.search.DisjunctionSucorer.countMatches () | 1.25% | 15603,283 | 15603
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader $ BlockDocsEnum.refillDocs () | 1.12% | 13929,646 | 13929
java.util.concurrent.locks.ReentrantLock.lock () | 1.05% | 13145,631 | 8618
org.apache.lucene.util.PriorityQueue.downHeap () | 1.00% | 12513.406 | 12513
java.util.TreeMap.get () | 0.89% | 11070.192 | 11070
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs () | 0.80% | 10026,117 | 10026
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader $ SegmentTerEnum $ Frame.decodeMetaData () | 0.62% | 7746.05 | 7746
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader.iterator () | 0.60% | 7482.395 | 7482
org.apache.lucene.codecs.BlockTreeTerReader $ FieldReader $ SegmentTerEnum.seekExact () | 0.55% | 6863.069 | 6863
org.apache.lucene.store.DataInput.clone () | 0.54% | 6721,357 | 6721
java.nio.DirectByteBufferR.duplicate () | 0.48% | 5930,226 | 5930
org.apache.lucene.util.fst.ByteSequenceOutputs.read () | 0.46% | 5708,354 | 5708
org.apache.lucene.util.fst.FST.findTargetArc () | 0.45% | 5601.63 | 5601
org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.readTermsBlock () | 0.45% | 5567,914 | 5567
org.apache.lucene.store.ByteBufferIndexInput.toString () | 0.39% | 4889.302 | 4889
org.apache.lucene.codecs.lucene41.Lucene41SkipReader. <init> () | 0.33% | 4147.285 | 4147
org.apache.lucene.search.TermQuery $ TermWeight.scorer () | 0.32% | 4045.912 | 4045
org.apache.lucene.codecs.MultiLevelSkipListReader. <init> () | 0.31% | 3890.399 | 3890
org.apache.lucene.codecs.BlockTreeTermsReader $ FieldReader $ SegmentTermsEnum $ Frame.loadBlock () | 0.31% | 3886.194 | 3886
If there is any other information that you could use, this might help, let me know.