This is mainly an architectural problem, and it probably depends a bit on personal taste. I will try to study the pros and cons (actually only the cons, this is pretty stubborn):
At the database level, MongoDB does not offer tools for providing referential integrity, so yes, you have to do it yourself. I suggest you use database objects that look like this:
public class DBObject { public ObjectId Id {get;set;} } public class Department : DBObject {
I suggest using simple DTOs like this one at the database level, no matter what. If you need extra sugar, put it in a separate layer, even if it means a bit of copying. The logic of database objects requires a very good understanding of how the driver moisturizes the object and may need to rely on implementation details.
Now, this is a matter of preference for whether you want to work with more “intelligent” objects. Indeed, many people like to use strongly typed auto-activated accessors, for example
public class Employee { public Department { get { return } } }
This model has a number of problems:
- In order for the
Employee object class, the model class, to remove an object from the database. This is complicated because it requires a DB to be entered or you need a static object to access the database, which can also be complex. - Access to the
Department looks completely cheap, but in fact it starts a database operation, it can be slow, it can fail. This is completely hidden from the caller. - At a 1: n ratio, things get a lot more complicated. For example, can the
Department display a list of Employees ? If so, is this a list (i.e., as soon as you start reading the first, should all employees be deserialized?) Or is it a lazy MongoCursor ? - To make matters worse, it is usually not clear which caching should be used. Say you get
myDepartment.Employee[0].Department.Name . Obviously this code is not smart, but imagine a call stack with several specialized methods. They can refer to the code in the same way, even if it is more hidden. Now the naive implementation is actually de-serializing the ref'd Department again. It's horrible. Caching, on the other hand, is aggressively dangerous because you really may want to re-extract the object. - Worst: Updates. So far, the problems have been mostly read-only. Now let me say that
employeeJohn.Department.Name = 'PixelPushers' and employeeJohn.Save() . Does this update the department or not? If so, have there been changes to john serialization first or after changes to dependent objects? How about version control and locking? - Many semantics are hard to implement:
employeJohn.Department.Employees.Clear() can be tricky.
Many ORMs use a set of complex templates to perform these operations, so these problems cannot be bypassed. But ORMs are usually in the range from 100 thousand to more than 1 M lines of code (!), And I doubt that you have this kind of time. In a DBMS, you need to activate related objects and use sth. like ORM, much more serious because you cannot embed, for example. a list of items on the invoice, so each 1: n or m: n relationship must be represented using a join. This is called object relationship mismatch.
The idea of document databases, as I understand it, is that you don’t need to tear your model as unnaturally as you need in an RDBMS. However, there are “object boundaries”. If you think of your data model as a network of connected nodes, the challenge is to know what part of the data you are currently working on.
Personally, I prefer not to put an abstraction layer on top of this because this abstraction is flowing, it hides what really comes from the caller, and tries to solve every problem with the same hammer.
Part of NoSQL's idea is that your query templates must be carefully mapped to a data model because you cannot just apply the JOIN hammer to any table in view.
So, my opinion is: stick to the thin layer and do most of the database operation at the service level. Move the DTO around instead of developing a complex domain model that breaks as soon as you need to add locking, mvcc, cascading updates, etc.