How does the order of complex indexes matter in MongoDB?

We need to create a composite index in the same order in which the parameters are requested. Is this order really related to performance?

Imagine that we have the totality of all people on Earth with the sex index (99.9% of the time β€œman” or β€œwoman”, but the string is unchanged (not binary)) and the index on name .

If we want to be able to select all people of a certain sex with a specific name , for example. all "men" with the name "John", is it better to first have a composite index with sex first or name ? Why not)?

+5
source share
2 answers

Redsandro,

You should consider Index Cardinality and Selectivity .


1. Index power

The power indicator indicates the number of possible values ​​for the field. The sex field has only two possible values. It has a very low power . Other fields, such as names, usernames, phone numbers, emails , etc., will have a more unique meaning for each document in the collection, which is considered high power .

  • High power

    The greater the power of the field, the more useful the index will be, because indexes narrow the search space, making it much smaller.

    If you have a pointer to sex and you are looking for men named John. You would only narrow the result space by about% 50 if you indexed sex first. Conversely, if you indexed name , you immediately narrowed down the result set to the smallest part of users named John, then you will refer to these documents to check the gender.

  • Rule of thumb

    Try creating indexes on the high-cardinality keys or place the high-cardinality keys first in the composite index. Read more about this in the section on complex indexes in the book:

    MongoDB Ultimate Guide


2. Selectivity

In addition, you want to use indexes selectively and write queries that limit the number of possible documents with an indexed field. To keep it simple, consider the following collection. If your index is {name:1} , if you run the query { name: "John", sex: "male"} . You will have to scan the document 1 . Because you allowed MongoDB to be selective.

 {_id:ObjectId(),name:"John",sex:"male"} {_id:ObjectId(),name:"Rich",sex:"male"} {_id:ObjectId(),name:"Mose",sex:"male"} {_id:ObjectId(),name:"Sami",sex:"male"} {_id:ObjectId(),name:"Cari",sex:"female"} {_id:ObjectId(),name:"Mary",sex:"female"} 

Consider the following compilation. If your index is {sex:1} , if you run the query {sex: "male", name: "John"} . You will have to scan documents 4 .

 {_id:ObjectId(),name:"John",sex:"male"} {_id:ObjectId(),name:"Rich",sex:"male"} {_id:ObjectId(),name:"Mose",sex:"male"} {_id:ObjectId(),name:"Sami",sex:"male"} {_id:ObjectId(),name:"Cari",sex:"female"} {_id:ObjectId(),name:"Mary",sex:"female"} 

Imagine possible differences in a larger dataset.


A little explanation of composite indexes

It is easy to make the wrong assumption about composite indices. According to the MongoDB Indexes Guide .

MongoDB supports composite indexes, where a single index structure contains links to multiple fields in collection documents. The following diagram illustrates an example of a composite index into two fields:

enter image description here

When creating a composite index 1, the index will contain several fields. Therefore, if we index the collection using {"sex" : 1, "name" : 1} , the index will look something like this:

 ["male","Rick"] -> 0x0c965148 ["male","John"] -> 0x0c965149 ["male","Sean"] -> 0x0cdf7859 ["male","Bro"] ->> 0x0cdf7859 ... ["female","Kate"] -> 0x0c965134 ["female","Katy"] -> 0x0c965126 ["female","Naji"] -> 0x0c965183 ["female","Joan"] -> 0x0c965191 ["female","Sara"] -> 0x0c965103 

If you index the collection using {"name" : 1, "sex" : 1} , the index will look something like this:

 ["John","male"] -> 0x0c965148 ["John","female"] -> 0x0c965149 ["John","male"] -> 0x0cdf7859 ["Rick","male"] -> 0x0cdf7859 ... ["Kate","female"] -> 0x0c965134 ["Katy","female"] -> 0x0c965126 ["Naji","female"] -> 0x0c965183 ["Joan","female"] -> 0x0c965191 ["Sara","female"] -> 0x0c965103 

Having {name:1} as your Prefix , you will be much better off using compound indexes. There is much more that can be read on this topic, I hope this can provide some clarity.

+19
source

I'm going to say that I myself did an experiment, and found that there seems to be no penalty for the efficiency of using a weakly distinguishable index. (I am using mongodb 3.4 with wiredtiger, which may be different from mmap). I have injected 250 million documents into a new collection called items . Each document looked like this:

 { field1:"bob", field2:i + "", field3:i + "" 

"field1" always equal to "bob" . "field2" was equal to i , so it was completely unique. First, I searched in field2, and it took more than a minute to scan 250 million documents. Then I created an index like this:

 `db.items.createIndex({field1:1,field2:1})` 

Of course, field1 is a "bob" for each individual document, so the index needs to look for several elements before finding the right document. However, this was not the result that I got.

I did another search in the collection after completing the index creation. This time I got the results that are listed below. You will see that "totalKeysExamined" is 1 each time. So maybe with a wired tiger or something that they figured out how to do it better. I read that wiredtiger actually compresses index prefixes, so it might have something to do with it.

db.items.find({field1:"bob",field2:"250888000"}).explain("executionStats")

 { "executionSuccess" : true, "nReturned" : 1, "executionTimeMillis" : 4, "totalKeysExamined" : 1, "totalDocsExamined" : 1, "executionStages" : { "stage" : "FETCH", "nReturned" : 1, "executionTimeMillisEstimate" : 0, "works" : 2, "advanced" : 1, ... "docsExamined" : 1, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 1, "executionTimeMillisEstimate" : 0, ... "indexName" : "field1_1_field2_1", "isMultiKey" : false, ... "indexBounds" : { "field1" : [ "[\"bob\", \"bob\"]" ], "field2" : [ "[\"250888000\", \"250888000\"]" ] }, "keysExamined" : 1, "seeks" : 1 } } 

Then I created an index on field3 (which has the same meaning as field 2). Then I searched:

db.items.find ({field3: "250888000"});

It took the same 4 ms as with the composite index. I repeated this several times with different values ​​for field2 and field3 and each time I got slight differences. This suggests that with wiredtiger there is no performance penalty for poor differentiation in the first index field.

0
source

Source: https://habr.com/ru/post/1235297/


All Articles