How do you write a SELECT COUNT (DISTINCT) query in CouchDB?

Is there a good way to reproduce the behavior of SELECT COUNT (DISTINCT field) in CouchDB?

Suppose we have the following document that records the time when the user played a specific song:

{ song_id: "happy birthday", user_id: "boris", date_played: [2011, 11, 14, 00, 12, 55], _id: ... } 

I would like to know the number of great songs our user "boris" has ever played . If our user listened to "Happy Birthday" 20 times, this song should still contribute only +1 to the total score of the song.

In MySQL, I just performed SELECT COUNT(DISTINCT song_id) FROM plays WHERE user_id = "boris" , but I draw a space when it comes to writing this in CouchDB.

Work-Around 1: If I changed my layout and instead saved all the songs in one user document for "boris", I could then write a map to only emit single values. However, if I wanted to build something on a scale of last.fm, I fear that updates will begin for a very long time, as the size of the "boris" document (number of plays) continues to grow. (There may also be a maximum document size that I would end up hitting).

Work-Around 2: I could also write a map function to return all the individual records that my python script could fail by itself; but again with hundreds of thousands of great songs this will also be very slow.

What other options am I missing?

+4
source share
4 answers

This answer was provided by Zachary Zolton on the couchdb mailing list:

http://mail-archives.apache.org/mod_mbox/couchdb-user/201111.mbox/%3CCAGnHtbJ-1-YeLWMLivKzWub98HZY7%2BesnPOHU4pEYgWAsxaszA%40mail.gmail.com%3E

Since you already have a view that will give you Boris 50k unique, you can use the _list function to return the number of rows.

Something like this should do the trick:

 function() { var count = 0; while(getRow()) count++; return JSON.stringify({count: count}); } 

If you request this list function, with the same view, key range and group level, it will simply respond with a bit of JSON, for example: {"count":"50612"}

You can read more here:

+3
source

Assuming I correctly interpreted your question;

map:

 function(doc) { emit([doc.user_id, doc.song_id], null); } 

reduce:

 _count 

request:

 ?startkey=[<userid>]&endkey=[<userid>,{}]&group=true 

Output Example:

 http://127.0.0.1:5984/foo/_design/a/_view/b?group=true& startkey=[%22foo%22]&endkey=[%22foo%22,{}] {"rows":[ {"key":["foo","bar"],"value":2}, {"key":["foo","bazbar"],"value":1} ]} 
+2
source

I struggled with the same (see http://mail-archives.apache.org/mod_mbox/couchdb-user/201410.mbox/browser )

It just doesn't work to get all this result when you just need a scalar value. Although the list function is a workaround to get the full flow of results, this approach seems very strange.

Any alternatives for this?

0
source

In recent versions of CouchDB (> 2.2), you can use the _approx_count_distinct abbreviation function . Your opinion will be:

map:

 function(doc) { emit([doc.user_id, doc.song_id], 1); } 

reduce:

 _approx_count_distinct 

and the request to get the number of song_ids for the user "boris" will be:

 /db/_design/_myddoc/_view/myview?group_level=1&key=["boris"] 
0
source

Source: https://habr.com/ru/post/1381252/


All Articles