How effective is tag search?

Situation:

I have 1 million users and I want to search for them by tags.

Here's the diagram:

const Account = new Schema({ tags: { type: [String], default: ['cats'] } }) 

Here is the first request:

 const query = { tags: 'cats'} const fields = { _id: 0 } const options = { skip: 0, limit: 5 } Account.find(query, fields, options) 

After calling find the Mongoose method will start the search and return the first 5 records that it matches. Performance in this case is optimal.

Here's the second request:

 const query = { tags: 'cats'} const fields = { _id: 0 } const options = { skip: 500 000, limit: 5 } Account.find(query, fields, options) 

What happens in this case is of interest to me.

Does Mongoose 500,000 records and then return the next 5 records?

Or does it somehow β€œjump” onto 500,000 elements without the need to compare them in advance?

I am trying to understand how effective this process is, and whether I can somehow improve it. Do you have any tips?

+5
source share
2 answers

Yes, at first it corresponds to 500,000 records.

From the final MongoDB 2nd Edition guide:

$ skip takes the number n and discards the first n documents from the result set. As with a normal query, it is not effective for large skips, as it must find all matches that should be skipped and then discard them.

In detail, you can examine your request in detail from the Mongo shell using explain ():

https://docs.mongodb.com/manual/reference/method/cursor.explain/

Example:

 // Create a dummy collection var tagTypes = ['cats', 'dogs', 'chickens'] for (i=0; i < 1000; i++) { let tag = tagTypes[Math.floor(Math.random()*tagTypes.length)] db.people.insert({"tags" : [tag]}) } 

then explain it

 db.people.find({"tags" : "cats"}).skip(100).limit(10).explain("executionStats") 

totalDocsExamined shows you that it matches all that it skips

 "executionStats" : { ... "totalDocsExamined" : 420 

If you were to create an index for tags:

 db.people.ensureIndex({ "tags" : 1 }) 

Running it, you get:

 "executionStats" : { ... "totalDocsExamined" : 110 
+2
source

Interesting. I understand that Find here is very similar to the function of queries on Select , where it will work inefficiently, first passing 500,000 values ​​before skipping them.

Therefore, I proposed to implement something known as layered. Basically, creating a nested indexing system, which, although it occupies a data space, is very effective, especially for sequential search methods, when you do not want to embed the tree directly. Create a set of external indexes that are more general, then bind them to the internal indexes accordingly before moving on to the saved data blocks.

The implementation of this should not be too complicated and should radically improve the number of actions taken. Hope this helps.

+2
source

Source: https://habr.com/ru/post/1270844/


All Articles