How to deal with many-to-many relationships in MongoDB when Embedding is not the answer?

Here's the deal. Suppose MongoDB has the following data schema:

  • items : a collection with large documents that contain some data (it absolutely doesn't matter what it really is).
  • item_groups : a collection with documents that contain a list of items._id called item_groups.items , plus some additional data.

So these two are related to many-to-many relationships. But there is one difficult thing: for some reason I cannot store elements inside groups of elements, therefore - just like the name says - an attachment is not an answer.

I am really worried about the query to find some specific groups that contain some specific elements (i.e. I have a set of criteria for each collection). In fact, he should also say how many elements in each found group met the criteria (no items mean that the group was not found).

The only viable solution I came across is to use the Map / Reduce approach with the reduced dummy function:

 function map () { // imagine that item_criteria came from the scope. // it a mongodb query object. item_criteria._id = {$in: this.items}; var group_size = db.items.count(item_criteria); // this group holds no relevant items, skip it if (group_size == 0) return; var key = this._id.str; var value = {size: group_size, ...}; emit(key, value); } function reduce (key, values) { // since the map function emits each group just once, // values will always be a list with length=1 return values[0]; } db.runCommand({ mapreduce: item_groups, map: map, reduce: reduce, query: item_groups_criteria, scope: {item_criteria: item_criteria}, }); 

Problem line:

 item_criteria._id = {$in: this.items}; 

What if this.items.length == 5000 or even more? In my RDBMS, the background shouts loudly:

 SELECT ... FROM ... WHERE whatever_id IN (over 9000 comma-separated IDs) 

definitely not the best way to go .

Thanks so much for your time guys!

I hope the best answer is "stupid, stop thinking in the RDBMS style, use $ its_a_kind_of_magicSphere from the latest version of MongoDB" :)

+6
source share
2 answers

I think you are struggling with the separation of object / object modeling using database schema modeling. I also struggled with this when I tried to try MongoDb.

For semantics and clarity, I'm going to replace Groups with the word Categories

Essentially, your theoretical model is a many-many relationship, since each Item can belong to Categories , and each Category can then have many Items .

This is best handled when modeling domain objects, and not in the database schema, especially when implementing a document database (NoSQL). In your MongoDb schema, you fake many-many relationships using a combination of top-level document models and embeddings.

Embedding is hard to master for people coming from SQL constants, but is an integral part of the answer. A trick determines whether it is shallow or deep, one-sided or two-sided, etc.


Top Level Document Models

Since your Category documents contain some of their data and largely refer to a huge number of Items , I agree with you that their full embedding inside each Item unreasonable.

Instead, treat Item and Category objects as top-level documents. Make sure your MongoDb schema allocates a table for each of them, so that each document has its own ObjectId .

The next step is to decide where and how much to embed ... there is no right answer, since it all depends on how you use it and what your scaling ambitions are ...

Investment Solutions

1. Elements

At a minimum, your Item objects must have a collection property for their categories. At the very least, this collection should contain an ObjectId for each Category .

My suggestion would be to add to this collection, the data that you use when interacting with Item most often ...

For example, if I want to list a bunch of elements on my web page in a grid and show the names of the categories in which they belong. Obviously, I donโ€™t need to know everything about Category , but if I have a built-in ObjectId, a second query is required to get any details about it.

Instead, itโ€™s most advisable to embed the Category Name property in the collection along with the ObjectId , so disabling the Item can now display its category names without another request.

The most important thing to remember is that the key / value objects embedded in your Item that "represent" a Category must not match the actual model of the Category document ... This is not OOP or relational database modeling.

2. Categories

Otherwise, you can refuse to embed in one direction and have no Item information in your Category documents ... or you can add a collection for the item data as described above ( ObjectId , or ObjectId + Name ) ...

In this direction, I personally would be inclined to ensure that nothing is implemented ... more than likely, if I want Item information for my category, I want it a lot, more than just a name ... and deep introduction of a document (document ) top level does not make sense. I just put up with a database query for the Items collection, where each of them had an ObjectId of my category in its category collection.

Uh ... confused for sure. The fact is that you will have some data duplication, and you will need to configure your models to use for better performance. The good news is that MongoDb and other document databases are good at ...

+4
source

Why not use the opposite design?

You save items and item_groups. If your first idea is to store items in item_group elements, then maybe the opposite idea is not bad :-)

Let me explain:

each element stores the groups to which it belongs. (You are in NOSql, data duplication is fine!) For example, suppose you store a list called groups in the record elements and your elements look like this: {_I would: ...., name: ...., groups : [ObjectId (...), ObjectId (...), ObjectId (...)]}

Then the idea of โ€‹โ€‹reducing the card requires a lot of power:

 map = function() { this.groups.forEach( function(groupKey) { emit(groupKey, new Array(this)) } } reduce = function(key,values) { return Array.concat(values); } db.runCommand({ mapreduce : items, map : map, reduce : reduce, query : {_id : {$in : [...,....,.....] }}//put here you item ids }) 

You can add some parameters (finalize, for example, to change the output of the card), but this can help you.

Of course, you need another collection in which you store item_groups details if you need to do this, but in some cases (if this information about item_groups does not exist or does not change, or you do not care if you do not have the most updated version), they you donโ€™t need at all!

Does this mean your hint of a solution to your problem?

+1
source

Source: https://habr.com/ru/post/894595/


All Articles