MongoDB: how to find documents ignoring case sensitivity, accents and percentages such as logic (%)

I would like to search the collection in my mongodb database. In my collection, I have documents with a "name" field, which can be such as:

[i] "Palácio Guanabara", "Palácio da Cidade", "Festa Palácio", etc.

When the user enters a search, for example "pala" or "palá" or "Pala" or "PalÁ", all those that are in [i] should create a result set.

I found that in MongoDB I can use regex in the search, for example:

{ "name": { $regex: new Regex(".*pala.*", "i") } } 

Well, this approach is case insensitive and uses percent logic from SQL ("% pala%"). But this does not ignore register accents in the database.

I found another alternative with the $ text index: https://docs.mongodb.org/manual/core/index-text/

This approach may ignore case sensitivity and accents. But "search" does not accept regular expression, so I can not search for things like "% pala%".

To summarize, I want to make the following SQL query in MongoDB:

 select * from collection where remove_accents(upper(name)) like '%Pala%' 

And this query returns results with the names "palácio", "palacio", "PaláCiô", etc.

+5
source share
2 answers

There is no magic bullet inside MongoDb. But since you are obviously changing user input anyway to create "% Pala%", then why not replace "a" with "[AA]" and wrap it with "*", so you can use regular expressions and have your diacritics.

Below are the options that don't work so much for creating notes.

French Letters [A-Za-Zàâäôééelépêçèÿæ–ÀÂÄÄÔÉÈËÊÏΟÇÙÛÜÆŒ]

German letters The dubious capital letter for ß, which is now included in unicode, is missing in many fonts, so it may appear on your screen as a question mark. [A-Za-ZäöüßÄÖÜẞ]

Polish letters [A-pr-uwy-Za-PR-UWY-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ] Note that there are no Q, V, and X in Polish. But if you want to allow all English letters, use [a-zA-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ]

Italian letters [A-Za-ZàèéìíîòóùúÀÈÉÌÍÎÒÓÙÚ]

Spanish letters [A-Za-ZáéíñóúüÁÉÍÑÓÚÜ] from http://www.rexegg.com/regex-interesting-character-classes.html#languages

+1
source

what happened if you just use:

 find({name: {$regex: 'pala', $options: "i"}}) 

you used new Regex() , which may be an invalid constructor whose actual constructor is new RegExp()

+8
source

Source: https://habr.com/ru/post/1247212/


All Articles