MongoDB: returning rows sequentially before and after a given row?

In MongoDB, given by the find () operator, which returns a cursor for a set of rows, what is an idiomatic and time-efficient way to return β€œcontextual” rows, that is, rows sequentially before and / or after each row in a set?

For me, the easiest way to explain this concept is to use ack , which supports contextual searching. Given the file:

line 1 line 2 line 3 line 4 line 5 line 6 line 7 line 8 

This is the result from ack:

 C:\temp>ack.pl -C 2 "line 4" test.txt line 2 line 3 line 4 line 5 line 6 

I store log data in a MongoDB collection, one document per line. Each magazine, each of which is tagged with keywords, and these keywords are indexed, which gives me a full-text search.

I follow the swamp standard:

 collection.find({keywords: {'$all': ['key1', 'key2']}}, {}).sort({datetime: -1}); 

and get the cursor. At this point, without adding any additional fields, what is the approach to get the context? I think the stream looks something like this:

  • For each line in the cursor:
    • Get the _id field, save to x.
    • execute: collection.find ({_ id: {'$ gt': x}}). limit (N)
      • Get results from each of these cursors.
    • execute: collection.find ({_ id: {'$ lt': x}}) sort ({_ id: 1}). limit (N)
      • Get results from each of these cursors.

For a result set with R rows, this requires 2R + 1 queries.

However, I think I can exchange space for time. Is there an alternative to updating each line with its _id context in the background? For a given line that currently has fields:

 _id, contents, keywords 

I would add an extra field:

 _id, contents, keywords, context_ids 

and then in a subsequent search, I could somehow use these context_ids, I think? I am not familiar with MongoDB MapReduce yet, but can it also get on the photo?

I think the most direct approach is to store the full text of the actual context lines in each line, but for me it is a little rude. The clear advantage is that a single request can return the context I need.

I appreciate any and all answers that agree on the scope of the question. I understand that I can use Lucene or a true full-text search engine out of range, but I'm trying to feel the edges and possibilities of MongoDB, so I will be grateful to the specific answers of MongoDB. Thanks!

+4
source share
1 answer

I think your approach to storing context_ids , or something like that, might be a better option. If you can save the context_ids all the context lines you need (this assumes that it has a fixed context size - say, 5 lines before and after), you can request a context for all lines using $in :

 # pseudocode for each matching row: context_rows = db.logs.find({_id: {$in: row['context_ids']}}).sort({_id: 1}) row_with_context = [context_rows_before_row] + row + [context_rows_after_row] 

I believe that knowing the set of context lines, in particular the lines after the line in question, may be difficult, since the lines after any given line will not necessarily exist.

An alternative that avoids this problem (but still requires a fixed, predetermined time) context is simply to save the _id of the first line of context before the line in question (i.e., when pasting, you can buffer the previous N lines, where N is the number of context ) - call this first_context_id - and then query like:

 # pseudocode for each matching row: rows_with_context = db.logs.find({_id: {$gte: row['first_context_id']}}).sort({_id: 1}).limit(N * 2 + 1) 

It can also simplify your application logic, since you do not need to collect the context with the corresponding string, this query returns both the matched string and the context strings.

+3
source

Source: https://habr.com/ru/post/1399381/


All Articles