I am using SOLR-3.4, spatial filtering with a circuit having LatLonType (subType = tdouble). I have an index of about 20 million places. My main problem is that if I do a bbox filter with cache = true, the performance is good enough (~ 40-50 QPS, latency 100-150 ms), but the big disadvantage is the crazy fast growth of the old generation heap, ultimately leading to main collections every 30-40 minutes (on a very large heap, 25 GB). And at this point, performance goes beyond what is unacceptable. On the other hand, I can turn off caching for bbox filters, but then my latency and QPS drops (latency decreases from 100 ms => 500 ms). NumericRangeQuery javadoc talks about the great performance that you can get (up to 100 ms), but now I wonder if it was with filterCache turned on, and no one bothered to look at the heap growth, which gives the result. I feel this is a kind of trick-22, since none of them are really acceptable.
I am open to any ideas. My last idea (untied) is to use geo hash (and pray that it either works better with cache = false, or have a more manageable heap growth if cache = true).
EDIT:
Exact step: default (8 for dual I think)
System memory: 32 GB (EC2 M2 2XL)
JVM: 24GB
Index Size: 11 GB
EDIT2:
A tdouble accurate to 8 means that your doubles will be split into 8-bit sequences. If all your latitudes and longitudes differ only in the last sequence of 8 bits, then tdouble will have the same representation with the usual double value in the range request. This is why I suggested testing precisionStep 4.
Question: what does this mean for double value?
source share