Lutzen spatial accuracy

I follow the example in Lucene in Action, pp. 308-315, which describes Lucene Spatial. I am using lucene 2.9.4. I used the http://geocoder.us/service/distance endpoint to calculate the distance between some locations, and then write unit tests to make sure the index can find locations within a given radius.

I am wondering how accurately I can expect lucene to be. For example, if I give a radius of 10.0, and the distance between my lat / lon points is 9.99 miles, can it find this location in all cases?

What raises this question is that I found that the search was very accurate for small radius values ​​(e.g. 10.0 or less) and inaccurate for large values ​​(e.g. r = 25.0).

Is there anything I can do wrong? Is it possible that the search engine will choose a level that does not have all the lats / lengths for a given radius? I realized that he chooses the smallest level, which is guaranteed to have all the points in the radius, that is, the level algorithm is just an optimization.

EDIT: Also I found this: https://issues.apache.org/jira/browse/LUCENE-2519 and the explicitly fixed code here: http://code.google.com/p/spatial-search-lucene/source/ browse / trunk / src / main / java / org / apache / lucene / spatial / tier / projection / SinusoidalProjector.java? r = 38 , but when I corrected my code to use a fixed SinusoidalProjector, my index returns null declarations in all cases .

And that does not give me much confidence:

http://www.lucidimagination.com/blog/2010/07/20/update-spatial-search-in-apache-lucene-and-solr/

http://www.lucidimagination.com/search/document/c32e81783642df47/spatial_rethinking_cartesian_tiers_implementation#c32e81783642df47

It seems that hacks exist throughout the code and just fixing the SinusoidalProjector is not enough.

+6
source share
2 answers

I spent some time on the source code, and I think I understand what is going wrong. First, I made the erroneous assumption that the distances calculated by the .us geocoder will be the same as what lucene internally calculates as the distances between points. Values ​​are close but not accurate. So I switched to calculating the distances between lat / lon pairs by calling lucene

double distance = DistanceUtils.getInstance().getDistanceMi(lat1,lon1,lat2,lon2); 

Then I dug into the DistanceQueryBuilder class http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-spatial/2.9.4/org/apache/lucene/spatial/tier/DistanceQueryBuilder. java? av = f , which I think has an error.

It computes the bounding box to select the Cartesian tiers as follows:

 CartesianPolyFilterBuilder cpf = new CartesianPolyFilterBuilder(tierFieldPrefix); Filter cartesianFilter = cpf.getBoundingArea(lat, lng, miles); 

And that is pretty clear by looking at LLRect.createBox http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-spatial/2.9.4/org/apache/lucene/spatial/ geometry / shape / LLRect.java # LLRect.createBox% 28org.apache.lucene.spatial.geometry.LatLng% 2Cdouble% 2Cdouble% 29 so that the third parameter getBoudningArea will be considered as the full width / height of the frame. Thus, passing the radius value results in a too small bounding box.

The fix was to provide an alternative version of DistanceQueryBuilder that does this:

 Filter cartesianFilter = cpf.getBoundingArea(lat,lng,miles*2); 

This seems to work. I am still convinced that DistanceApproximation http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-spatial/2.9.4/org/apache/lucene/spatial/geometry/shape /DistanceApproximation.java#DistanceApproximation.getMilesPerLngDeg%28double%29 does not work, because it seems that the following operations should be reversible, but it is not:

 // similar to implementation of DistanceUtils.getBoundary(): double milesPerLng = DistanceApproximation.getMilesPerLngDeg(lat); double milesPerLat = DistanceApproximation.getMilesperLatDeg(); double lngDelta = radius / milesPerLng; double latDelta = radius / milesPerLat; // Now it seems like this should be roughly true: assertEquals(radius, DistanceUtils.getInstance().getDistanceMi(lat,lng,lat,lng+lngDelta)); assertEquals(radius, DistanceUtils.getInstance().getDistanceMi(lat,lng,lat+latDelta,lng)); 

But this is not so. For example, if the above code is set to lat = 34, lng = -118 and radius = 25 (and instead of saying that I just print the results), I get:

 Lng delta: 0.36142327178505024, dist: 20.725929003138496 Lat delta: 0.4359569489852007, dist: 30.155567734407825 

I assume that the code only works because the Cartesian levels selected after selecting the bounding box will lead to an area slightly larger than the bounding box. But I do not think it will be guaranteed.

I hope someone who has more knowledge about this can comment, because these are just observations after he dug the code for a day. I noticed that what looks like the latest code for lucene spaces is on googlecode at: http://code.google.com/p/spatial-search-lucene/ , and it seems that the implementation has changed significantly, but I did not go too deep into the details.

+4
source

They fixed this in Lucene 3.5.0. Long distances now work as well as small

0
source

Source: https://habr.com/ru/post/892594/


All Articles