MongoDB index optimization when using text search in aggregation structure

Question

MongoDB index optimization when using text search in aggregation structure

We are building a simplified version of the search engine on top of MongoDB.

Data set example

{ "_id" : 1, "dept" : "tech", "updDate": ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" } { "_id" : 2, "dept" : "tech", "updDate": ISODate("2014-07-27T09:45:35Z"), "description" : "wireless red mouse" } { "_id" : 3, "dept" : "kitchen", "updDate": ISODate("2014-04-27T09:45:35Z"), "description" : "green placemat" } { "_id" : 4, "dept" : "kitchen", "updDate": ISODate("2014-05-27T09:45:35Z"), "description" : "red peeler" } { "_id" : 5, "dept" : "food", "updDate": ISODate("2014-04-27T09:45:35Z"), "description" : "green apple" } { "_id" : 6, "dept" : "food", "updDate": ISODate("2014-01-27T09:45:35Z"), "description" : "red potato" } { "_id" : 7, "dept" : "food", "updDate": ISODate("2014-08-28T09:45:35Z"), "description" : "lime green computer" } { "_id" : 8, "dept" : "food", "updDate": ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" } { "_id" : 9, "dept" : "food", "updDate": ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer" }

We want to avoid using the "offset-limit" to split the results, to do this, we mainly use the "search method" by modifying the "where / match" sentence of the query to be able to use the index instead of repeating through the collection to get the desired results. For more information on the “search method,” I highly recommend reading http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way

Search engines usually order results by date and date of update in descendant order. To do this, we use the text search function in the aggregation pipeline as follows.

 db.inventory.createIndex({description:"text", dept: -1, updDate: -1, id:-1})

First page

 db.inventory.aggregate( [ { $match: { dept : {$in : ["food","kitchen"]},"$text" : { "$language" : "en", "$search" : "green"} } },{ $project: {score: { $meta: "textScore" }, description : 1, updDate : 1, _id: 1 } }, { $sort: { "score" : -1, "updDate" : -1, _id: -1 } }, {$limit: 2 }] ) { "_id" : 5, "updDate" : ISODate("2014-04-27T09:45:35Z"), "description" : "green apple", "score" : 0.75 } { "_id" : 3, "updDate" : ISODate("2014-04-27T09:45:35Z"), "description" : "green placemat", "score" : 0.75 }

Second page

 db.inventory.aggregate( [ { $match: { dept : {$in : ["food","kitchen"]},"$text" : { "$language" : "en", "$search" : "green"} } },{ $project: {score: { $meta: "textScore" }, description : 1, updDate : 1, _id: 1 } }, { $sort: { "score" : -1, "updDate" : -1, _id: -1 } }, { "$match" : { "$or" : [ { "score" : { "$lt" : 0.75}} , { "$and" : [ { "score" : { "$eq" : 0.75}} , { "$or" : [ { "updDate" : { "$lt" : ISODate("2014-04-27T09:45:35Z")}},{ "$and" : [ { "updDate": { "$eq" : ISODate("2014-04-27T09:45:35Z")}} , { "_id" : { "$lt" : 3}}]}]}]}]}},{$limit: 2 }] ) { "_id" : 7, "updDate" : ISODate("2014-08-28T09:45:35Z"), "description" : "lime green computer", "score" : 0.6666666666666666 } { "_id" : 9, "updDate" : ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer", "score" : 0.6666666666666666 }

And the last page

 db.inventory.aggregate( [ { $match: { dept : {$in : ["food","kitchen"]} , "$text" : { "$language" : "en", "$search" : "green"} } }, { $project: {score: { $meta: "textScore" }, description : 1, updDate : 1, _id: 1 } }, { $sort: { "score" : -1, "updDate" : -1, _id: -1 } }, { "$match" : { "$or" : [ { "score" : { "$lt" : 0.6666666666666666}} , { "$and" : [ { "score" : { "$eq" : 0.6666666666666666}} , { "$or" : [ { "updDate" : { "$lt" : ISODate("2014-08-27T09:45:35Z")}} , { "$and" : [ { "updDate" : { "$eq" : ISODate("2014-08-27T09:45:35Z")}} , { "_id" : { "$lt" : 9}}]}]}]}]}}, {$limit: 2 }] ) { "_id" : 8, "updDate" : ISODate("2014-08-27T09:45:35Z"), "description" : "lime green computer", "score" : 0.6666666666666666 }

Notice how we order results using score, updDate and id, and in the second phase of the match we try to break them into pages using the value of the document’s rating, date of update, and finally, the identifier.

Creating an index takes into account that text queries cannot be covered in text index prefixes, see the question https://jira.mongodb.org/browse/SERVER-13018 , although I'm not sure if this applies to our case.

Since the "executeStats" and "allPlansExecution" modes do not work as part of the aggregation, see https://jira.mongodb.org/browse/SERVER-19758 . I do not know how MongoDB is trying to resolve the request.

Since index intersection does not work for text search, see https://jira.mongodb.org/browse/SERVER-3071 (allowed in 2.5.5) and http://blog.mongodb.org/post/87790974798/efficient- indexing-in-mongodb-26 , where the author says that

 As of version 2.6.0, you cannot intersect with geo or text indices and you can intersect at most 2 separate indices with each other. These limitations are likely to change in a future release.

After reading sections 3.4 (Text Search Tutorials) and 3.5 (Indexing Strategies) https://docs.mongodb.org/manual/MongoDB-indexes-guide-master.pdf several times, without receiving a clear conclusion.

So what is the best indexing strategy for indexing this collection in terms of text search?

One index for the first phase of the match, and the second for the second phase of the match (pagination)?

 db.inventory.createIndex({description:"text", dept: -1}) db.inventory.createIndex({updDate: -1, id:-})

Compound index taking into account fields from both phases of correspondence?

 db.inventory.createIndex({description:"text", dept: -1, updDate: -1, id:-1})

None of the above?

thanks

+3

mongodb mongodb-indexes aggregation-framework

Jordi Oct 30 '15 at 12:42

source share

No one has answered this question yet.

See similar questions:

4

Paginated chronological queries in multiple collections using MongoDB?

or similar:

600

When to use CouchDB over MongoDB and vice versa

467

When to Radish? When in MongoDB?

459

When to use MongoDB or other document-oriented systems?

63

Mongodb Explain for Aggregation Structure

58

MongoDB OR aggregate compatibility

12

Index optimization for mongodb aggregation structure

2

MongoDB collection - shard by hashed index on user field _id

0

Decreased performance in mongodb aggregation structure

0

MongoDB optimizes indexes for aggregation

0

Mongodb doctrine query builder full-text search does not sort textScore results

MongoDB index optimization when using text search in aggregation structure

More articles: