Best Data Model for Mass Relations in MongoDB

We are adopting MongoDB for a new solution and are currently trying to develop the most efficient data model for our needs, this is the relationship between data elements.

We must adhere to a three-way relationship between users, items, and lists. A user can have many items and many lists. The list will have one user and many items. An item can belong to many users and many lists. The latter is especially important - an element can belong to a potentially huge number of lists: thousands, of course, and potentially tens or hundreds of thousands. Perhaps even millions in the future. We should be able to navigate these relationships in both directions: for example, get all the items in a list or all the lists to which the item belongs. We also need a common solution so that we can add a lot more types of documents and relationships to them if we need to.

So it seems that there are two possible solutions. Firstly, for each document in the database there is a collection of "relationships" consisting of an array of identifiers. Thus, in the list document, a collection of relations for elements with identifiers of all elements and a collection of relations with one identifier for the user will be created. In this model, these arrays will become massive when an item belongs to many, many users, or many, many lists.

The second model requires a new type of document, a “relationship” document, which stores the identifiers of each partner and the name of the relationship. This allows you to store more data in general and, thus, will affect disk space. It also looks like an “unnatural” way to approach this problem in NoSQL.

Efficiency, reasonable in area, architecture-wise, which is better and why?

Cheers, Matt

+4
source share
2 answers

It depends on your access patterns.

  • The built-in array of identifiers is better for reading. With one quick read, you get the identifiers of all related objects and now you can go and get them. But if your update speed is high, you will have some problems, since mongodb will have to copy the same (already large) object again and again, as it outgrows the boundaries of its disk.

    But this solution is really bad for writing. Imagine an item that belongs to several million lists. You decided to delete it. Now you need to go through all these lists and get this element identifier from your reference array. it's exciting, isn't it?

  • Saving links as separate documents is useful for writing. Adding, editing and deleting new links is pretty fast. But this solution takes up more disk space and, more importantly, precious RAM. Also, reading is not so fast, especially if you have a lot of links.

    Given your numbers ("maybe even millions in the future"), I would go with this decision. You can always use some hardware to speed up queries. Recording scaling is traditionally the most difficult part, and in this solution, recordings are fast and reliable.

+7
source

I agree with Sergio regarding the data access patterns that are key here.

I will also add an additional possible solution for storing the fourth type of document with three properties - a link to each user, list and item. This collection can be indexed for quick access to all three fields, uniquely indexed in all fields to prevent duplicates, and allows you to quickly insert and delete.

Ultimately, you don’t store much more data this way, because if you need to look for relationships on both sides ("What items in which lists does this user have?" And "What users have this item in their lists?") still need to duplicate links.

It feels relational, but sometimes it’s the best solution.

+1
source

Source: https://habr.com/ru/post/1394228/


All Articles