Is there a good way to reproduce the behavior of SELECT COUNT (DISTINCT field) in CouchDB?
Suppose we have the following document that records the time when the user played a specific song:
{ song_id: "happy birthday", user_id: "boris", date_played: [2011, 11, 14, 00, 12, 55], _id: ... }
I would like to know the number of great songs our user "boris" has ever played . If our user listened to "Happy Birthday" 20 times, this song should still contribute only +1 to the total score of the song.
In MySQL, I just performed SELECT COUNT(DISTINCT song_id) FROM plays WHERE user_id = "boris" , but I draw a space when it comes to writing this in CouchDB.
Work-Around 1: If I changed my layout and instead saved all the songs in one user document for "boris", I could then write a map to only emit single values. However, if I wanted to build something on a scale of last.fm, I fear that updates will begin for a very long time, as the size of the "boris" document (number of plays) continues to grow. (There may also be a maximum document size that I would end up hitting).
Work-Around 2: I could also write a map function to return all the individual records that my python script could fail by itself; but again with hundreds of thousands of great songs this will also be very slow.
What other options am I missing?
source share