Effectively sort mongodb geospatial query results

I have a very large collection of documents like:

{ loc: [10.32, 24.34], relevance: 0.434 } 

and want to efficiently complete the request, for example:

  { "loc": {"$geoWithin":{"$box":[[-103,10.1],[-80.43,30.232]]}} } 

with arbitrary fields.

Adding a 2d index to loc makes this very fast and efficient. However, I now want to get the most relevant documents as well:

 .sort({ relevance: -1 }) 

Because of which everything is distorted during scanning (in any particular field there can be a huge amount of results, and I just need 10 or so).

Any recommendation or help is much appreciated !!

+4
source share
4 answers

Have you tried using an aggregation structure?

A two-stage pipeline can work:

  • a $ match that uses an existing $ geoWithin request.
  • a $ sort , which is sorted by relevance: -1

Here is an example of how it might look:

 db.foo.aggregate( {$match: { "loc": {"$geoWithin":{"$box":[[-103,10.1],[-80.43,30.232]]}} }}, {$sort: {relevance: -1}} ); 

I'm not sure how this will work. However, even if this is bad with MongoDB 2.4, it can be very different in 2.6 / 2.5, since 2.6 will include improved aggregation sorting performance .

+6
source

When there is a huge result corresponding to a specific field, the sort operation is really expensive, so you definitely want to avoid it. Try to create a separate index in the relevance field and try to use it (without the 2d index at all): the query will be executed much more efficiently this way - documents (already sorted by relevance) will be scanned one by one, matching the given geobox condition. When the top 10 are found, you are good.

Perhaps this is not so fast if the geo field matches only a small subset of the collection. In the worst case, he will have to scan the entire collection.

I suggest you create 2 indexes (loc vs. relevance) and run tests on queries that are common to your application (using the mongo hint to force using the required index).

Depending on the results of your tests, you can even add some application logic so that if you know that the window is huge, you can run a query with a relevancy index, otherwise use loc 2d index. Just a thought.

+2
source

You cannot have a scan and order value of 0 when you try to use sorting on the composite key side. Unfortunately, there is currently no solution for your problem that is not related to the fact that you are using the 2d index or otherwise.

When you execute the explanation command at your request, the value of "scanAndOrder" indicates the weather, it was necessary to have a sort phase after collecting the result or not. If this is true, sorting after the request was necessary, if this false sorting is not needed.

To check the situation, I created a collection named t2 in the db sample as follows:

 db.createCollection('t2') db.t2.ensureIndex({a:1}) db.t2.ensureIndex({b:1}) db.t2.ensureIndex({a:1,b:1}) db.t2.ensureIndex({b:1,a:1}) for(var i=0;i++<200;){db.t2.insert({a:i,b:i+2})} 

While you can use only 1 index to support the query, I performed the following test with the results included:

 mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("b_1").explain() { "cursor" : "BtreeCursor b_1", "isMultiKey" : false, "n" : 150, "nscannedObjects" : 200, "nscanned" : 200, "nscannedObjectsAllPlans" : 200, "nscannedAllPlans" : 200, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "b" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] }, "server" : "localhost:27418", "millis" : 0 } mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("a_1_b_1").explain() { "cursor" : "BtreeCursor a_1_b_1", "isMultiKey" : false, "n" : 150, "nscannedObjects" : 150, "nscanned" : 150, "nscannedObjectsAllPlans" : 150, "nscannedAllPlans" : 150, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 1, "indexBounds" : { "a" : [ [ 50, 1.7976931348623157e+308 ] ], "b" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] }, "server" : "localhost:27418", "millis" : 1 } mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("a_1").explain() { "cursor" : "BtreeCursor a_1", "isMultiKey" : false, "n" : 150, "nscannedObjects" : 150, "nscanned" : 150, "nscannedObjectsAllPlans" : 150, "nscannedAllPlans" : 150, "scanAndOrder" : true, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 1, "indexBounds" : { "a" : [ [ 50, 1.7976931348623157e+308 ] ] }, "server" : "localhost:27418", "millis" : 1 } mongos> db.t2.find({a:{$gt:50}}).sort({b:1}).hint("b_1_a_1").explain() { "cursor" : "BtreeCursor b_1_a_1", "isMultiKey" : false, "n" : 150, "nscannedObjects" : 150, "nscanned" : 198, "nscannedObjectsAllPlans" : 150, "nscannedAllPlans" : 198, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "b" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ], "a" : [ [ 50, 1.7976931348623157e+308 ] ] }, "server" : "localhost:27418", "millis" : 0 } 

Indexes in individual fields do not help much, therefore a_1 (not sorting support) and b_1 (not supported by queryin) are missing. The index on a_1_b_1 will also not be lucky, as long as it works worse than one a_1, the mongoDB engine will not use the situation when the part associated with the value “a” saved in this way. What is worth a try is the composite index b_1_a_1, which in your case is relevant_1_loc_1, while it will return the results in an ordered manner, so scanAndOrder will be false, and I have not tested it for index 2d, but I assume that it excludes scanning Some documents are based only on the index value (therefore, in the test in this case nscanned is higher than nscannedObjects). The index, unfortunately, will be huge, but still smaller than the documents.

+2
source

This solution is valid if you need to search inside the field (rectangle).

The problem with the geospatial index is that you can only place it before the integral index (at least this is the case with mongo 3.2)

So, I thought, why not create your own geospatial index? All I need to do is create a composite index on Lat, Lgn (X, Y) and add the sort field first. Then I will need to implement the search logic inside the borders of the window and, in particular, instruct the mango to use it (tooltip).

Turning to your problem:

 db.collection.createIndex({ "relevance": 1, "loc_x": 1, "loc_y": 1 }, { "background": true } ) 

Logics:

 db.collection.find({ "loc_x": { "$gt": -103, "$lt": -80.43 }, "loc_y": { "$gt": 10.1, "$lt": 30.232 } }).hint("relevance_1_loc_x_1_loc_y_1") // or whatever name you gave it 

Use $ gte and $ lte if you need inclusive results.

And you do not need to use .sort () as it is already sorted, or you can do a reverse sort by relevance if you need to.

The only problem I encountered is when the drawer area is small. It took more time to find small areas than large ones. That's why I saved the geospatial index to search for small areas.

+1
source

Source: https://habr.com/ru/post/1499445/


All Articles