What is the difference between a secondary index and an inverted index in Cassandra?

Question

What is the difference between a secondary index and an inverted index in Cassandra?

When I read about these two, I thought that both of them explain the same approach, I searched Google but found nothing. Is there a difference in implementation? Cassandra makes the secondary index itself, but should the inverted index be implemented by me?

Which is faster in search, by the way?

+6

search indexing cassandra inverted-index

fereshteh Oct 08 '13 at 13:01

source share

1 answer

Richard · Accepted Answer · 2013-10-08T14:24:40+0000

The main difference is that the secondary indexes in Kassandra are not allocated in the same way as the manual inverted index. With built-in secondary indexes, each node indexes the data it stores locally (using the LocalPartitioner). With manual indexing, indexes are allocated independently of the nodes that store the values.

This means that for embedded indexes, each query must go to each node, whereas if you did the inverted indexing manually, you just go to one node (plus replicas) to request the value you were looking for. One of the advantages of a local index is that indexes can be updated atomically with data. (Although, since Cassandra 1.2, atomic batches can be used for this, although they are slightly slower.)

That's why Cassandra indices are not recommended for really high power data. If you search on each node, but there is only one or two results, this is inefficient, and a manual inverted index would be better. If your search returns a lot of results, then you will need to search each node so that inline indexes work well.

Another benefit of using Cassandra's built-in indexing is that indexes are updated lazily, so you don't need to read every update. (See CASSANDRA-2897 .) This can be a significant speed improvement for high write-through indexed tables.

What is the difference between a secondary index and an inverted index in Cassandra?

More articles: