Multithreaded node creation in Neo4j

Question

Multithreaded node creation in Neo4j

I created 1 million Neo4j nodes in batches of 10,000, each batch in its own transaction. It is strange that parallelizing this process with multi-threaded execution did not have a positive impact on performance. It is as if transactions in different threads were blocking each other.

Here is a Scala code snippet that validates this using parallel collections:

import org.neo4j.kernel.EmbeddedGraphDatabase object Main extends App { val total = 1000000 val batchSize = 10000 val db = new EmbeddedGraphDatabase("neo4yay") Runtime.getRuntime().addShutdownHook( new Thread(){override def run() = db.shutdown()} ) (1 to total).grouped(batchSize).toSeq.par.foreach(batch => { println("thread %s, nodes from %d to %d" .format(Thread.currentThread().getId, batch.head, batch.last)) val transaction = db.beginTx() try{ batch.foreach(db.createNode().setProperty("Number", _)) }finally{ transaction.finish() } }) }

and here are the build.sbt lines needed to build and run it:

 scalaVersion := "2.9.2" libraryDependencies += "org.neo4j" % "neo4j-kernel" % "1.8.M07" fork in run := true

You can switch between parallel and serial modes by deleting and adding a .par call before an external foreach . The console output clearly shows that when executing .par execution is indeed multithreaded.

To eliminate possible problems with concurrency in this code, I also tried an actor-based implementation, with approximately the same result (6 and 7 seconds respectively for serial and parallel versions).

So the question is: did I do something wrong or is this a limitation of Neo4j? Thanks!

+4

scala neo4j transactions

Oleg Mirzov 25 sept. '12 at 20:54

source share

2 answers

Package insertion does not work with multiple threads. From neo4j doc:

Always perform batch insertion in one thread (or use synchronization to make only one thread at a time access to the batch installer) and cause shutdown after completion.

Neo4j Batch Insert

+2

Jan Sep 27 '12 at 9:41

source share

Michael hunger · Accepted Answer · 2012-09-27T22:58:36+0000

The main problem is that your tx reaches about the same time. And transaction commits are serialized entries in the transaction log. If the records alternate in time and the actual node-creating a more expensive process, you will get acceleration.

Multithreaded node creation in Neo4j

More articles: