Neo4j indices and outdated data

Question

Neo4j indices and outdated data

I have a legacy dataset ( ENRON data represented as GraphML) that I would like to query. In the comment in the relevant question, @StefanArmbruster suggests that I use Cypher to query the database. My case of using the request is simple: if the message identifier (Message node property) is set, retrieve the node that has this identifier, and also get the sender and receiver nodes of this message.

It seems that for this in Cypher I first need to create a node index. Is there a way to do this automatically when data is loaded from a graphML file? (I used Gremlin to load data and create a database.)

I also have an external Lucene data index (I need this for other purposes). Does it make sense to have two indexes? I could, for example, index the Neo4J node identifiers into my external index, and then query the graph based on these identifiers. My concern is to keep these identifiers. (By analogy, Lucene document identifiers should not be considered persistent.)

So, if I:

Neo4j index graph internally retrieving message identifiers using Cypher? (If so, what is the best way to do this: restore the database with some suitable spell to get the index built? Create an index for an existing db?)
Store Neo4j node files in my external Lucene index and retrieve nodes through these saved identifiers?

UPDATE

I am trying to get automatic indexing to work with Gremlin and the embedded server, but no luck. The documentation says

The base database is automatically indexed, see Section 14.12, “Automatic Indexing,” so the script can return the imported node by searching by index.

But when I look at the graph after loading a new database, the indexes do not seem to exist.

The Neo4j documentation on automatic indexing suggests configuration needs. In addition to setting node_auto_indexing = true you need to configure it

To automatically index something, you must set which properties should be indexed. You do this by specifying the property keys for the index on. In the configuration file, use the node_keys_indexable and relationship_keys_indexable configuration keys. If using inline mode, use GraphDatabaseSettings.node_keys_indexable and GraphDatabaseSettings.relationship_keys_indexable configuration charts. In all cases, the value must be a comma-separated list of property keys for indexing.

So, should Gremlin set the GraphDatabaseSettings parameters? I tried passing the map to the Neo4jGraph constructor as follows:

  Map<String,String> config = [ 'node_auto_indexing':'true', 'node_keys_indexable': 'emailID' ] Neo4jGraph g = new Neo4jGraph(graphDB, config); g.loadGraphML("../databases/data.graphml");

but this did not have a noticeable effect on the creation of the index.

UPDATE 2

Instead of setting up the database via Gremlin, I used the examples provided in the Neo4j Documentation , so my database creation was like this (in Groovy):

 protected Neo4jGraph getGraph(String graphDBname, String databaseName) { boolean populateDB = !new File(graphDBName).exists(); if(populateDB) println "creating database"; else println "opening database"; GraphDatabaseService graphDB = new GraphDatabaseFactory(). newEmbeddedDatabaseBuilder( graphDBName ). setConfig( GraphDatabaseSettings.node_keys_indexable, "emailID" ). setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ). setConfig( GraphDatabaseSettings.dump_configuration, "true"). newGraphDatabase(); Neo4jGraph g = new Neo4jGraph(graphDB); if (populateDB) { println "Populating graph" g.loadGraphML(databaseName); } return g; }

and my search was done as follows:

 ReadableIndex<Node> autoNodeIndex = graph.rawGraph.index() .getNodeAutoIndexer() .getAutoIndex(); def node = autoNodeIndex.get( "emailID", "< 2614099.1075839927264.JavaMail.evans@thyme >" ).getSingle();

And it seemed to work. Note, however, that calling getIndices() on the Neo4jGraph object still returns an empty list. So the result is that I can use the Neo4j API correctly, but the Gremlin shell does not seem to reflect the indexing state. The expression g.idx('node_auto_index') (registered with Gremlin Methods ) returns null.

+4

neo4j cypher gremlin

Gene golovchinsky Oct 31 '12 at 23:13

source share

2 answers

Peter Neubauer · Answer 1 · 2012-11-01T05:11:05+0000

automatic indexes are created lazily. That is, when you turn on automatic indexing, the actual index is first created when indexing your first property. Make sure you insert data before checking for an index, otherwise it may not be displayed.

For some automatic indexing code (using software configuration) see, for example, https://github.com/neo4j-contrib/rabbithole/blob/master/src/test/java/org/neo4j/community/console/IndexTest. java (this works with Neo4j 1.8

/Peter

Eve freeman · Answer 2 · 2012-11-01T00:21:42+0000

Have you tried the auto index feature? This is basically the use case you are looking for - unfortunately, it must be enabled before importing data. (Otherwise, you need to remove / add properties in order to reindex them.)

http://docs.neo4j.org/chunked/milestone/auto-indexing.html

Neo4j indices and outdated data

More articles: