How to deal with binding of separately stored objects in DB documents like Mongo?

This problem is easily solved in ORMs such as Entity Framework or NHibernate, but I do not see a ready-made solution in the C # driver for MongoDb. Say I have a collection of objects of type A that reference objects of type B that I need to store in a separate collection, so that as soon as a specific object of B is changed, all A that refer to it should be aware of the change. In other words, I need this object relation to normalize. At the same time, I need B to refer to A inside the class, not to Id, but by type, as shown below:

public class A { public B RefB { get; set; } } 

Do I need to handle all this concretization of links on my own? If so, which approaches are best used? Should I store both the B identifier and the B link in the class and somehow take care of synchronizing their values:

 public class A { // Need to implement reference consistency as well public int RefBId { get; set; } private B _refB; [BsonIgnore] public B RefB { get { return _refB; } set { _refB = value; RefBId = _refB.Id } } } 

I know that someone can say that a relational database fits this case best, I know, but I really need to use a Db document like MongoDb, it solves a lot of problems, and in most cases I need to store objects that are normalized for my project, however sometimes we may need a mixed design in one repository.

+6
source share
3 answers

This is mainly an architectural problem, and it probably depends a bit on personal taste. I will try to study the pros and cons (actually only the cons, this is pretty stubborn):

At the database level, MongoDB does not offer tools for providing referential integrity, so yes, you have to do it yourself. I suggest you use database objects that look like this:

 public class DBObject { public ObjectId Id {get;set;} } public class Department : DBObject { // ... } public class EmployeeDB : DBObject { public ObjectId DepartmentId {get;set;} } 

I suggest using simple DTOs like this one at the database level, no matter what. If you need extra sugar, put it in a separate layer, even if it means a bit of copying. The logic of database objects requires a very good understanding of how the driver moisturizes the object and may need to rely on implementation details.

Now, this is a matter of preference for whether you want to work with more “intelligent” objects. Indeed, many people like to use strongly typed auto-activated accessors, for example

 public class Employee { public Department { get { return /* the department object, magically, from the DB */ } } } 

This model has a number of problems:

  • In order for the Employee object class, the model class, to remove an object from the database. This is complicated because it requires a DB to be entered or you need a static object to access the database, which can also be complex.
  • Access to the Department looks completely cheap, but in fact it starts a database operation, it can be slow, it can fail. This is completely hidden from the caller.
  • At a 1: n ratio, things get a lot more complicated. For example, can the Department display a list of Employees ? If so, is this a list (i.e., as soon as you start reading the first, should all employees be deserialized?) Or is it a lazy MongoCursor ?
  • To make matters worse, it is usually not clear which caching should be used. Say you get myDepartment.Employee[0].Department.Name . Obviously this code is not smart, but imagine a call stack with several specialized methods. They can refer to the code in the same way, even if it is more hidden. Now the naive implementation is actually de-serializing the ref'd Department again. It's horrible. Caching, on the other hand, is aggressively dangerous because you really may want to re-extract the object.
  • Worst: Updates. So far, the problems have been mostly read-only. Now let me say that employeeJohn.Department.Name = 'PixelPushers' and employeeJohn.Save() . Does this update the department or not? If so, have there been changes to john serialization first or after changes to dependent objects? How about version control and locking?
  • Many semantics are hard to implement: employeJohn.Department.Employees.Clear() can be tricky.

Many ORMs use a set of complex templates to perform these operations, so these problems cannot be bypassed. But ORMs are usually in the range from 100 thousand to more than 1 M lines of code (!), And I doubt that you have this kind of time. In a DBMS, you need to activate related objects and use sth. like ORM, much more serious because you cannot embed, for example. a list of items on the invoice, so each 1: n or m: n relationship must be represented using a join. This is called object relationship mismatch.

The idea of ​​document databases, as I understand it, is that you don’t need to tear your model as unnaturally as you need in an RDBMS. However, there are “object boundaries”. If you think of your data model as a network of connected nodes, the challenge is to know what part of the data you are currently working on.

Personally, I prefer not to put an abstraction layer on top of this because this abstraction is flowing, it hides what really comes from the caller, and tries to solve every problem with the same hammer.

Part of NoSQL's idea is that your query templates must be carefully mapped to a data model because you cannot just apply the JOIN hammer to any table in view.

So, my opinion is: stick to the thin layer and do most of the database operation at the service level. Move the DTO around instead of developing a complex domain model that breaks as soon as you need to add locking, mvcc, cascading updates, etc.

+6
source

In the docs database, when you do something like your first example:

 public class A { public B RefB { get; set; } } 

You completely insert the value of B into the RefB property. In other words, your document is as follows:

 [a/1] { AProp: "foo", RefB: { BProp: "bar" } } 

This helps to look at things in terms of Domain Driven Design (DDD). This nesting pattern usually occurs when B is either a "value object" or a "non-aggregate object" (using DDD terminology).

This can also happen if you store a point-in-time snapshot of some other aggregate object. In this case, you do not want to update the values ​​of B if they change, or it will no longer represent this point in time.

Another template should relate to A and B as separate units. If you need to contact another, you indicate this only with reference to your identifier.

 public class A { public string BId { get; set; } } 

Then your documents will be saved, for example:

 [a/1] { AProp: "foo", BId: "b/2" } [b/2] { BProp: "bar", } 

Note. I believe in MongoDB, you would use the ObjectId type. In RavenDB you usually use string , but int is possible with minor adjustments. Other document databases may allow other types.

The part that doesn’t work well in document databases is how you showed in your second example A saving a link to B without saving it as part of the document. This template may work in ORMs such as Entity Framework or NHibernate, but it is usually implemented through virtual properties and proxy classes. In a document database environment, they are not delayed.

So, if they are separate documents, instead of loading A and using a.RefB to get to B , you simply load A and B individually. For example, you can load A and use BId to determine how to load B

Of course, the question still comes down to whether to embed or connect. This is what you will need to find out, as it can often be done anyway. Typically, one method works better than the other for a specific domain problem. But you usually do not do both.

+3
source

Document databases based on completely different architectural concepts than relational databases. The basic principle of NoSQL databases is aggregation, not relation. Therefore, you should not expect normalization in the db you are describing.

Your problem should only be tracked manually. There is no such thing in NoSQL as referential integrity.

+1
source

Source: https://habr.com/ru/post/954701/


All Articles