Why is CouchDB reduce_limit enabled by default? (Is it better to approximate SQL JOINS in MapReduce views or List views?)

Question

Why is CouchDB reduce_limit enabled by default? (Is it better to approximate SQL JOINS in MapReduce views or List views?)

I am using CouchDB and I want to use MapReduce better when querying data.

My specific use case:

I have a lot of polls. Each survey has a meter number, metric and metric readings, for example:

{ meterNumber: 1, meterReading: 2050, meterReadingDate: 1480000000000 }

Then I use the Map function, taking readings with meterNumber. There are many keys that are repeated (reading the same counter on different dates). i.e.

 [ [meterNumber, {reading: xxx, readingDate: xxx}], [meterNumber, {reading: xxx, readingDate: xxx}], [meterNumber, {reading: xxx, readingDate: xxx}], etc ]

Then I group them before sending them to the reduction function, and the reduction function should actually EXPAND set the values. That is, I want this:

 [ [meterNumber, [{reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}]], [meterNumber, [{reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}]], [meterNumber, [{reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}, {reading:xxx, readingDate: xxx}]], etc ]

To run this MapReduce view on CouchDB, I had to enable this type of result set ( Couchdb - is it possible to deactivate the reduce_overflow_error error ).

This suggests that I may run into performance issues with large result sets. This is true? Why do you need to specifically enable this setting in CouchDB?

*** EDIT

The accepted answer below states that what I was doing in MapReduce was also possible (and better) using lists. Here's another good answer on the same topic: The best way to do "one to many" and "JOIN". in CouchDB

*** EDIT

Here is the link from the CouchDB documentation: http://guide.couchdb.org/draft/transforming.html

+1

list couchdb mapreduce

Zach smith Jun 25 '16 at 7:28

source share

1 answer

Aurélien · Accepted Answer · 2016-06-25T14:03:33+0000

The A reduce function is designed to reduce the values associated with given keys.

CouchDB reduce_limit here to detect poorly designed reduction functions, which was done by combining the values ... But don't panic: anyone new to CouchDB would have made the same error.

The problem with concatenating values in the reduce function is this:

this is completely unnecessary (if you need the whole list, just use one map function)
this is very inefficient: your index will become more and more on your disk, and you will have more time to access the disk.

So ... Just write a minimal map function such as:

 function(o){ emit(o.meterNumber); }

Do not write any reduce functions. And call the view with include_docs=true .

But maybe you are not happy with the data format? No problem: you have list functions for this. Just remember that the map and reduce functions should be used for pure data processing, and not for formatting purposes.

Why is CouchDB reduce_limit enabled by default? (Is it better to approximate SQL JOINS in MapReduce views or List views?)

More articles: