I need to support a large directed graph G, possibly millions of nodes and edges. Potentially, this may not correspond to memory.
Some of the common operations that I need to perform on this chart include:
Each node / edge will have properties associated with it, such as access amount, weight, etc.
For each node (vertex) I will need to perform an efficient query based on property values. For example, find a node whose X value is greater than v1 but less than v2. This probably requires creating an index for certain fields.
I will need to find all incoming edges and outgoing edges of a given node and update the weight of the edges.
I will need to do a local (based on DFS) traversal from the given node and return all the paths that satisfy a specific user predicate (this predicate can use the node / edge property values ββin the path).
I will need to effectively add / remove nodes / edges. This is not performed as often as operation 1, 2, 3.
Potentially, hotspots appear on the graph, which are accessed much more often than other parts, and I would like to cache these hotspots in memory.
What is an effective way to achieve this with minimal implementation effort?
I am looking at some disk-based databases such as Neo4j / InfiniteGraph / DEX. Despite the fact that they support all of the above operations, this seems redundant because I don't need many of the features that they offer, such as consistency / parallel control or cluster-based replication. In addition, many of them are based on Java, and I prefer something with a C / C ++ interface.
Basically, I just need a library on disk that efficiently handles persistence, query on nodes, and local crawl. Do you have any recommendations regarding an existing project (open source) that I can use? If not, what is the best way to implement such a thing?
source share