How to request multi-index in RethinkDB by an array of objects

Question

How to request multi-index in RethinkDB by an array of objects

I am working with a dataset that looks something like this:

"bitrates": [ { "format": "mp3" , "rate": "128K" } , { "format": "aac" , "rate": "192K" } ] , "details": [ ... ] , "id": 1 , "name": "For Those About To Rock We Salute You" , "price": 1026 , "requires_shipping": false , "sku": "ALBUM-1" }

And I wanted to create a secondary index on bitrates by bending {multi:true} . This was my attempt:

 r.db("music").table("catalog").indexCreate("bitrates", {multi: true})

The index is built just fine, but when I request it, nothing is returned - which seems to contradict every example that I read here:

http://rethinkdb.com/docs/secondary-indexes/javascript/

I wrote this query:

 r.db("music").table("catalog").getAll(["mp3", "128K"], {index : "bitrates"})

There is no error, only 0 results (and I have 300 or so documents with this exact data).

I am using RethinkDB 2.0 RC1.

+6

rethinkdb

user1151 Apr 2 '15 at 14:18

source share

2 answers

Secondary index keys cannot be objects right now:

 > r.table('foo').indexCreate('bitrates', {multi: true}) > r.table('foo').getAll({format: "mp3", rate: "128K"}, {index: 'bitrates'}) RqlRuntimeError: Secondary keys must be a number, string, bool, pseudotype, or array

You can track this issue at https://github.com/rethinkdb/rethinkdb/issues/2773 .

For a workaround, you can do this:

 > r.table('foo').indexCreate('bitrates', function(row){ return row('bitrates').map(function(bitrate){return bitrate.coerceTo('array');}) }, {multi: true}); > r.table('foo').getAll(r.expr({format: "mp3", rate: "128K"}).coerceTo('array'), {index: 'bitrates'})

0

mlucy Apr 2 '15 at 19:31

source share

Nate kohari · Accepted Answer · 2015-04-02T15:08:20+0000

When you create an index for a column, the values in the column are literally used as index keys. In your case, the keys for your bitrates index will be objects in the bitrates array in the document.

It seems that you want - this is the index obtained from the values in the document field. To do this, you want to define a custom indexing function that reduces the document only to data that interests you. The easiest way to experiment with them is to start by writing a query, and as soon as you are happy with the results, converting it into an indexCreate() statement.

Here is an instruction that captures your sample document (with identifier 1) and discards the terms format and rate from all objects in the bitrate array, and then combines them together to create a separate set of lines:

 r.db('music').table('catalog').get(1).do(function(row) { return row('bitrates').map(function(bitrate) { return [bitrate('format'), bitrate('rate')]; }).reduce(function(left, right) { return left.setUnion(right); }) })

Running this statement returns the following:

 ["mp3", "128K", "aac", "192K"]

It looks good, so we can use our function to create an index. In this case, since we expect the indexing function to return a set of elements, we also want to specify {multi: true} so that we can query the elements in the set, not the set itself:

 r.db('music').table('catalog').indexCreate('bitrates', function(row) { return row('bitrates').map(function(bitrate) { return [bitrate('format'), bitrate('rate')]; }).reduce(function(left, right) { return left.setUnion(right); }) }, {multi: true})

After creating, you can query your index as follows:

 r.db('music').table('catalog').getAll('mp3', {index: 'bitrates'})

You can also provide several query terms to match strings that match any of the elements:

 r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'})

However, if one document matches more than one term in your request, it will be returned more than once. To fix this, add distinct() :

 r.db('music').table('catalog').getAll('mp3', '128K', {index: 'bitrates'}).distinct()

If necessary, you can also use downcase() to normalize the shell of terms used in the secondary index.

You can also completely skip the entire indexing business and use the filter() query:

 r.db('music').table('catalog').filter(function(row) { return row('bitrates').map(function(bitrates) { return [bitrates('format'), bitrates('rate')]; }).reduce(function(left, right) { return left.setUnion(right); }).contains('mp3'); })

However, if you almost always query your table in the same way, creating a secondary index using a custom function will result in significantly better performance.

How to request multi-index in RethinkDB by an array of objects

More articles: