MongoDB - Materialization View / OLAP Aggregation and Performance

Question

MongoDB - Materialization View / OLAP Aggregation and Performance

I read in MongoDB. I am particularly interested in the ability to aggregate. I am looking at several data sets consisting of at least 10 million rows per month and creating aggregates from this data. This is time series data.

Example. Using Oracle OLAP, you can load data from the second / minute level and have this roll up to hours, days, weeks, months, quarters, years, etc ... just define your measurements from there. This works quite well.

So far I read that MongoDB can handle the above using the map reduction functionality. The map reduction function can be implemented so that it gradually updates the results. This makes sense since I will upload new data, say weekly or monthly, and I would only expect to process new data that will be downloaded.

I also read that map reduction in MongoDB can be slow. To overcome this, the idea is to use cheap commercial equipment and spread the load on several machines.

So here are my questions.

How good (or bad) does MongoDB handle the map in terms of performance? Do you really need a lot of cars to get acceptable performance?
As for the workflow, is it relatively easy to store and combine incremental results generated using a map, reduce it?
How much of a performance improvement does the aggregation structure offer?
Does the aggregation structure provide the ability to save results in stages in the same way as the existing map / reduce functionality.

I appreciate your answers in advance!

+6

mongodb nosql

Dave Aug 4 '12 at 18:09

source share

2 answers

The Couchbase map reduce is designed to create incremental indexes, which can then be dynamically queried for the level of collapse you're looking for (just like the Oracle example you gave in your question).

Here is a record of how this is done using Couchbase: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-views-sample-patterns-timestamp.html

+4

J chris a Aug 7 '12 at 21:23

source share

Stennie · Accepted Answer · 2012-08-06T04:06:50+0000

How good (or bad) does MongoDB handle the map in terms of performance? Do you really need a lot of cars to get acceptable performance?

MongoDB Map / Reduce the implementation (in version 2.0.x) is limited by its dependence on the single - threaded JavaScript SpiderMonkey engine . There have been experiments with the v8 JavaScript engine and concurrency has been improved, and performance is a common design goal.

The new Aggregate Framework is written in C ++ and has a more scalable implementation, including a pipelined approach. Each pipeline is currently single-threaded, but you can run different pipelines in parallel. Currently, the aggregation structure does not replace all tasks that can be performed in Map / Reduce, but simplifies many common use cases.

The third option is to use MongoDB for storage in conjunction with Hadoop through the MongoDB Hadoop connector . Hadoop currently has a more scalable Map / Reduce implementation and can access MongoDB collections for input and output through the Hadoop Connector.

From a workflow point of view, is it relatively easy to store and combine incremental results generated by map reduction?

Map / Reduce has several options, including merging incremental output into a previous output collection or returning inline results (in memory).

How much of a performance improvement does the aggregation structure offer?

It really depends on the complexity of your card / reduction. In general, the aggregation structure is faster (and in some cases significantly). Your best bet is to make a comparison for your own use case.

MongoDB 2.2 is not officially released, but 2.2rc0 has been available since mid-July.

Does the aggregation structure provide the ability to save results in stages in the same way as the existing map / reduce functionality.

Currently, the aggregation structure is limited to returning results to the line, so you need to process / display the results when they are returned. The results document is also limited by the maximum document size in MongoDB (currently 16 MB).

There is a suggested $out pipeline command ( SERVER-3253 ), which is likely to be added in the future for additional output options.

Some additional readings that may be of interest:

presentation at MongoDC 2011 at the Time Series Data Warehouse in MongoDB
presentation at MongoSF 2012 on New MongoDB Aggregation Framework
private collections that can be used similarly to RRD

MongoDB - Materialization View / OLAP Aggregation and Performance

More articles: