MongoDB - one collection using indexes

Question

MongoDB - one collection using indexes

Ok, so more and more I am developing in Mongodb. I'm starting to think about the need to create several collections and have one large collection with indexes (since columns and fields may differ for each document, as opposed to tabular data). If I try to develop in the most efficient way (which means less code and reusable code), then I can use one collection for all documents and just index by field. Having all the documents in one collection with indexes, I can reuse all the processing code of my form and another code, since all of them will be inserted into one collection.

Example:

Suppose I am developing a contact manager, and I have two types of contacts: people and business. My initial thought was to create a collection called people and a second collection called business. But that was because I used to develop in sql, where yes, that would be appropriate since the columns would be different for each table. The more I started thinking about the flexibility of the dbs document, the more I started thinking: "Do I really need two collections for this?" If I simply add a field to each document called a “contact type” and point to it, do I really need two collections? Since the fields / columns in each document do not have to be the same for everyone (e.g. in sql), each document can have its own fields if I have a document type field and an index in that field.

So, I accepted this concept and started thinking, if I need only one collection for "individuals" and "enterprises", then I even need a separate collection for "Users" or "Contact History" or any other data. Theoretically, I could not build an entire solution in the collection and simply have a field in each document that indicates the “type” and the index on it, such as “Users”, “Individual contact”, “Business contacts”, “Contact history” " etc., and if it is a document associated with another document, I can index it in the "parent key / foreign" field ...

This will allow me to dynamically encode the front-end, as the form processing code will be the same (insert into the same set). This will save a lot of encodings, but I want to make sure using indexes and secondary indexes that db will work quickly and will not cause future problems as the collection grows. As you can imagine, if everything was in one collection, this collection could have hundreds of thousands, even millions of documents, as the user base is growing, but there will be indexes and secondary indexes to optimize performance.

My question is: Is this general method mongodb developers? Why or why not? What are the disadvantages, if any? If this is a widely used method, please also give any positive results for using this method. Thanks.

+4

collections indexing mongodb

user982853 Mar 04 '12 at 18:02

source share

2 answers

kelloti · Answer 1 · 2012-03-04T18:18:42+0000

This is a very important point in Mongo, and the answer is a bit more art than science. Having one collection full of gigantic documents is certainly an anti-pattern, because it works against many features of Mongo.

For example, when receiving documents, you can extract the entire document from the collection (not quite true, but mostly). Therefore, if you have huge documents, you extract huge documents every time. In addition, the presence of huge documents makes it less dangerous, since only top-level documents are indexed (and therefore postponed) in each collection. You can index values deep into the document, but the index value is associated with a top-level document.

At the same time, purely relational is also an anti-pattern, because you have lost significant referential integrity by going to Mongo first. In addition, all connections are made in the application memory, so each of them requires a full reverse (slow).

So, the answer is to do something in between. I think you probably want to put together a collection for people and another collection for business in this case. I am talking about this because it seems that companies have enough metadata that can be a lot saturated. (In addition, I treat individual-business relations as many, many). However, a person may have a Name object (with parameters first and last ). It would be a bad idea to make Name in a separate collection.

Some information from 10gen about circuit design: http://www.mongodb.org/display/DOCS/Schema+Design

EDIT

In addition, Mongo has limited transaction support - in the form of atomic aggregates. When you insert an object in mongo, the whole object is either pasted or not pasted. Thus, you are an application domain that requires consistency between certain objects, probably you want to save them in one document / collection.

For example, consider an application that requires User always have a Name object (containing FirstName , LastName and MiddleInitial ). If a User were somehow inserted without the corresponding Name , the data would be considered corrupted. In a DBMS, you must wrap the transaction around operations to insert User and Name . In Mongo, we make sure that Name is in the same document (aggregate) as User to achieve the same effect.

Your example is a little less clear, since I do not understand the point. One thing that comes to mind is that Mongo has excellent support for inheritance. It may make sense to put all users, individuals, and potentially enterprises in the same collection (depending on how the application is modeled). If one person has many contacts, you probably want individuals to have an array of identifiers. If your application requires a quick view of contacts, you can consider duplicating part of an individual user and saving an array of contact objects.

If you are used to thinking RDBMS, you probably think that all your data should always be consistent. The truth is that probably not quite so. This concept of applying atomic aggregates to a domain has been widespread recently by the DDD community. When you carefully examine your domain, as your business users do, the boundaries of consistency should become clear.

christophmccann · Answer 2 · 2012-03-04T18:12:22+0000

MongoDB and NoSQL generally relate to the de-normalization of data and the reduction of joins. This is contrary to normal SQL thinking.

In your case, I see no reason why you would like to have separate collections, because it introduces unnecessary complexity and overhead. Consider, for example, if you want to have a screen displaying all contacts in alphabetical order. If you have one collection for contacts, then it is very easy, but if you have two collections, this becomes a more complicated proposition.

If I have several collections, if your application had several users who store contacts. Then I will have one collection for each user. This makes it easy to retrieve user contacts.

MongoDB - one collection using indexes

More articles: