Spatial poor performance solr

I am using SOLR-3.4, spatial filtering with a circuit having LatLonType (subType = tdouble). I have an index of about 20 million places. My main problem is that if I do a bbox filter with cache = true, the performance is good enough (~ 40-50 QPS, latency 100-150 ms), but the big disadvantage is the crazy fast growth of the old generation heap, ultimately leading to main collections every 30-40 minutes (on a very large heap, 25 GB). And at this point, performance goes beyond what is unacceptable. On the other hand, I can turn off caching for bbox filters, but then my latency and QPS drops (latency decreases from 100 ms => 500 ms). NumericRangeQuery javadoc talks about the great performance that you can get (up to 100 ms), but now I wonder if it was with filterCache turned on, and no one bothered to look at the heap growth, which gives the result. I feel this is a kind of trick-22, since none of them are really acceptable.

I am open to any ideas. My last idea (untied) is to use geo hash (and pray that it either works better with cache = false, or have a more manageable heap growth if cache = true).

EDIT:

Exact step: default (8 for dual I think)

System memory: 32 GB (EC2 M2 2XL)

JVM: 24GB

Index Size: 11 GB

EDIT2:

A tdouble accurate to 8 means that your doubles will be split into 8-bit sequences. If all your latitudes and longitudes differ only in the last sequence of 8 bits, then tdouble will have the same representation with the usual double value in the range request. This is why I suggested testing precisionStep 4.

Question: what does this mean for double value?

+4
source share
1 answer

Having a Solr profile when responding to your spatial queries will be very helpful to understand what is slow, see hprof , for example.

However, here are a few ideas on how you could (possibly) improve latency.

At first, you can try to check what happens when precisionStep is reduced (for example, try 4). If latitudes and longitudes are too close to each other, and precisionStep is too high, Lucene cannot take advantage of multiple indexed values.

You can also try giving a little less JVM memory to give the OS cache a better chance of caching commonly used index files.

Then, if it's still not fast enough, you can try expanding the replacement of TrieDoubleField as a subfield with a type field that will use the frange query for the getRangeQuery method. This will reduce the amount of disk access when calculating the range by using higher memory. (I never tested it, it could provide terrible performance.)

+1
source

Source: https://habr.com/ru/post/1400324/


All Articles