In O'Reilly's book, Graph Databases, in Chapter 6, which describes how Neo4j stores a graph database, she says:
To understand why the inline processing of graphs is much more efficient than graphs based on intensive indexing, consider the following. Depending on the implementation, index queries can be O (log n) in algorithmic complexity compared to O (1) for finding direct relationships. To go through a network of m steps, the cost of the indexed approach, on O (m log n), carlizes the cost of O (m) for an implementation that uses no index.
It is then explained that Neo4j achieves this constant search of time by storing all nodes and relationships as fixed-size records:
With fixed record sizes and record identifiers like pointers, crawls are implemented simply by chasing pointers around a data structure that can run at a very high speed. To go through a specific relationship from one node to another, the database performs somewhat cheap ID calculations (these calculations are much cheaper than searching for global indexes, as wed should do if you falsify a graph in a non-graphical database)
This last sentence raises my question: how does Titan using Cassandra or HBase as storage achieve these gains or offset it?
source share