Mongodb returns uppercase strings first when sorting

When I tried to sort the collection of the string field (here Title ), sorting does not work properly. See below:

 db.SomeCollection.find().limit(50).sort({ "Title" : -1 }); 

Actual Results Order

  • "Name": "The book of students geog.3"
  • "Name": "The book of students geog.2"
  • "Name": "The book of students geog.1"
  • "Name": "Zoe and Swift"
  • "Name": "Zip code in the theme park"
  • "Name": "Postcode in the supermarket"

Expected Results

  • "Name": "Zoe and Swift"
  • "Name": "Zip code in the theme park"
  • "Name": "Postcode in the supermarket"
  • "Name": "The book of students geog.3"
  • "Name": "The book of students geog.2"
  • "Name": "The book of students geog.1"

The same problems arise when I tried to sort by Date field.

Any suggestions?

+7
source share
5 answers

Update : version 3.4 has case insensitive indexes

This is a known issue. MongoDB does not support lexical sorting for strings ( JIRA: string lexicographic ordering ). You must sort the results in the application code or sort using a number field. It must reliably sort the fields. Can you give an example when sorting by date does not work?

+6
source

What exactly surprises you?

It is sorted based on the representation of the numerical representation of the character. If you look here (I know mongodb stores the string in UTF-8, so this is for educational purposes only). You will see that uppercase letters have corresponding numbers lower than lowercase letters. So they will go ahead.

Mongodb cannot sort letters based on localization or case insensitivity.

In your case, g has a higher number, then Z , so it comes first (sorting in descending order). And then 3 has a corresponding number higher than 2 and 1 . Therefore, in principle, everything is correct.

+3
source

If you use aggregation, the expected result can be seen below:

     db.collection.aggregate ([
     { 
         "$ project": {
            "Title": 1,        
            "output": {"$ toLower": "$ Title"}       
         }},
         {"$ sort": {"output": - 1}},
         {"$ project": {"Title": 1, "_id": 0}}
     ])


it will give you the expected result as shown below:

     {
         "result": [ 
             {
                 "Title": "Zoe and Swift"
             }, 
             {
                 "Title": "Zip at the Theme Park"
             }, 
             {
                 "Title": "Zip at the Supermarket"
             }, 
             {
                 "Title": "geog.3 students' book"
             }, 
             {
                 "Title": "geog.2 students' book"
             }, 
             {
                 "Title": "geog.1 students' book"
             }
         ],
         "ok": 1
     }

+3
source

Starting from dates not sorted correctly ....

If you save the date as a string , you need to sort it as a string. It is pretty simple:

 2013-11-08 // yyyy-mm-dd (the dashes would be optional) 

As long as each part of the date string is filled with 0 correctly, the strings will be sorted naturally and according to what you expected.

The full date time is usually stored in UTC:

 2013-11-23T10:46:01.914Z 

But I also suggest that instead of saving the date value as a string, you consider whether it makes sense to use MongoDB's native date ( reference ). If you look at the MongoDb aggregation structure, you will find that there are many functions that can manipulate these dates, while the string is very limited.

Regarding the sorting of strings, it was pointed out that sorting like a computer stores data, not a sorting method like a human. If you think the string is stored as its ASCII / UTF-8 representation, you should see why sorting works the way it is:

 Zoe = [90, 111, 101] geo = [103, 101, 111] 

If you need to sort them in descending order as you specify, you should see how the representation of the internal byte "geo" larger than the string "Zoe" (with 103 sorting above 90 in this case).

Typically, the recommendation when using MongoDb is to store strings twice if you need to sort a string with a mixed case:

  • Original string ( "Title" )
  • Like a normalized string. Perhaps, for example, everything as "lowercase letters", possibly with accented characters, are also converted to a common character. So, you get, for example, a new field called "SortedTitle" , and your code will use it to sort, but will display the actual " Title" for users.
+2
source

If you are doing in ror and mongomapper, follow these steps:

I took the name of my abc model and got the result for Title.

 @test_abc_details_array_full=Abc.collection.aggregate([ {"$project"=> { "Title"=> 1, "output"=> { "$toLower"=> "$Title" } }}, { "$sort"=> { "output"=>1 } }, {"$project"=> {Title: 1, _id:0}}, ]); 
0
source

Source: https://habr.com/ru/post/957657/


All Articles