MongoDB checks for multiple regular expressions within a list for free text search

I am creating db mongoDB to allow (simple) keyword searches using multikeys, as recommended here . The entry also looks:

{ title: { title: "A river runs through", _keywords: ["a","river","runs","through"] ) , ... } 

I am using nodejs server side, so I am using javascript. The following query will match (this was done in the mongo terminal):

 > db.torrents_sorted.find({'title._keywords' : {"$all" : ["river","the"]} }).count() 210 

However, it is not:

 > db.torrents_sorted.find({'title._keywords' : {"$all" : ["/river/i","/the/i"]} }).count() 0 > db.torrents_sorted.find({'title._keywords' : {"$all" : [{ "$regex" : "river", "$options" : "i" },{ "$regex" : "the", "$options" : "i" }]} }).count() 0 

Using one regular expression (without using $ and or $ all) corresponds to:

db.torrents_sorted.find ({'title._keywords': {"$ regex": "river", "$ options": "i"}}). count () One thousand four hundred sixty one

Interestingly, using python and pymongo to compile regular expressions really works:

 >>> db.torrents_sorted.find({'title._keywords': { '$all': [re.compile('river'), re.compile('the')]}}).count(); 236 

I'm not necessarily looking for a solution that uses regular expressions, but it requires the keywords to be matched on shorter lines, so "riv" matches "river", which seems ideal for regular expressions (or LIKE in sql).

My next idea is to try switching to a javascript function that matches regular expressions in a list or maybe passes a separate function for each regular expression (this seems to scream hacking me :), although I assume it will be slower and the performance is very is important.

+4
source share
2 answers

Well, I have an answer that is interesting in a different way. The error I experienced with regular expressions exists in version 1.8 of mongodb and has been resolved, shown here .

Unfortunately, the hosting company caring for db atm cannot offer version 2.0, and the $ keyword and the keyword were added in version 2.0, although thanks for the debugging help of Samarth.

So instead, I wrote a javascript function to match regular expressions:

 function () { var rs = [RegExp(".*river.*"), RegExp(".*runs.*")]; for(var j = 0; j < rs.length; j++) { var val = false; for (var i = 0; !val && i < this.title._keywords.length; i++) val = rs[j].test(this.title._keywords[i]); if(!val) return false; } return true; } 

This is done in O (n ^ 2) time (not very cool), but will fail in linear time if the first regular expression does not match any of the keywords (since I'm looking for a disjunction).

Any contribution to the optimization is appreciated, although if this is the best solution I can find for 1.8, I might have to find somewhere else to store my db in the near future ;;).

0
source

You might want to use the $ and operator.

+2
source

Source: https://habr.com/ru/post/1400581/


All Articles