In MongoDB, is it practical to store all comments on a message in a single document?

Question

In MongoDB, is it practical to store all comments on a message in a single document?

I read in the description of the dbs-based document, you can, for example, insert all comments in a message in the same document as the message, if you prefer:

{ _id = sdfdsfdfdsf, title = "post title" body = "post body" comments = [ "comment 1 ......................................... end of comment" . . n ] }

My situation is similar to where each comment can be up to 8 KB in size, and there can be up to 30 of them per message.

Although it’s convenient to embed comments in the same document, I wonder if performance affects large documents, especially when the MongoDb server and the HTTP server are running on different machines and need to communicate over the local network?

+6

mongodb

Roman Jun 18 '12 at 5:10

source share

5 answers

Remon van Vliet · Answer 1 · 2012-06-18T11:28:55+0000

Conducting this answer after some others, I will repeat some of the things mentioned. Accept the first suitable answer, not this one.

However, there are a few things to consider. Consider these three questions:

Will you always ask for all the comments every time you request a message?
Do you want to request comments directly (for example, request comments for a specific user)?
Will your system have relatively low usage?

If all questions can be answered yes, then you can embed an array of comments. In all other scenarios, you probably need a separate collection to store your comments.

First of all, you can actually update and delete comments in the safe concurrency mode (see updates using positional operators), but there are some things you cannot do, such as index based inserts.

The main problem with using built-in arrays for any large collection is the problem of switching to updating. MongoDB reserves a certain amount of indentation (see db.col.stats().paddingFactor ) per document to allow it to grow as needed. If you run out of this add-on (and this will often be the case in your case), he will have to move this ever-growing document to disk. This makes updates an order of magnitude slower and therefore a serious problem for high-bandwidth servers. A related but slightly less important issue is bandwidth. If you have no choice but to request the entire post with all its comments, even if you show only the first 10, you will spend a lot of traffic, which can be a problem for cloud environments (you can use the $ slice to avoid some of them) .

If you want to enable the built-in here, do the following basic operations:

Add a comment:

 db.posts.update({_id:[POST ID]}, {$push:{comments:{commentId:"remon-923982", author:"Remon", text:"Hi!"}}})

Update comment:

  db.posts.update({_id:[POST ID], 'comments.commentId':"remon-923982"}, {$set:{'comments.$.text':"Hello!"}})

Delete comment

 db.posts.update({_id:[POST ID], 'comments.commentId':"remon-923982"}, {$pull:{comments:{commentId:"remon-923982"}}})

All of these concurrency methods are safe, because update criteria are part of a write (process) lock.

With all that, you probably want to get a dedicated collection for your comments, but this happens with the second choice. You can either store each comment in a highlighted document, or use comment codes, say, 20-30 comments each (described in detail here http://www.10gen.com/presentations/mongosf2011/schemascale ). This has advantages and disadvantages, so it is up to you to decide which approach is best for what you want to do. I would go for buckets if your comments for a post can exceed a couple of hundred due to the performance o (N) of the sk (N) cursor method, which you will need to swap. In all other cases, just go with the comments to the workflow. This is most flexible with requesting comments for other use cases.

mnemosyn · Answer 2 · 2012-06-18T08:44:41+0000

It depends heavily on the operations you want to allow, but a separate collection is usually better.

For example, if you want to allow users to edit or delete comments, it is a very good idea to keep comments in a separate collection, because these operations are difficult or impossible to express only with atom modifiers and state control becomes painful. The documentation also covers this .

The key problem with embed comments is that you will have different authors. Typically, a blog post can only be edited by the authors of the blog. With built-in comments, the reader also gets write access to the object, so to speak.

Code like this would be dangerous:

 post = db.findArticle( { "_id" : 2332 } ); post.Text = "foo"; // in this moment, someone does a $push on the article comments db.update(post); // now, we've deleted that comment

snoopdave · Answer 3 · 2012-06-18T10:55:06+0000

For performance reasons, it is best to avoid documents that can grow over time:

Fill Factors:

“When you update a document in MongoDB, the update occurs in-place if the document does not grow in size. If the document grows in size, however, you may need to move it to disk to find a new disk place with enough adjacent space to fit the new a larger document. This can lead to problems with write performance if a collection of many indexes from the moment of transition require updating all indexes for the document. "

http://www.mongodb.org/display/DOCS/Padding+Factor

Jonathan ng · Answer 4 · 2012-06-18T08:32:02+0000

If you always get a message with all its comments, why not?

If you don’t do this or get comments in the request other than the message (i.e., view all user comments on the user’s page), then probably not, because the requests will become much more complicated.

Julian Hollmann · Answer 5 · 2012-06-18T07:35:31+0000

Short answer: Yes and no.

Say you are writing a mongoDB based blog. You would insert your comments in your post.

Why: it’s easy to get a request, you just need to make one request and get all the data needed to display.

Now you know that you will receive large documents with subdocuments. Since you need to serve them through your local network, I highly recommend that you store them in a different collection.

Why: Sending large documents through your network takes time. And I assume that there are situations when you do not need every single document.

TL DR: Both options work. I recommend storing your comments in a separate table.

In MongoDB, is it practical to store all comments on a message in a single document?

More articles: