Neo4j node creation speed

I have a new installation of neo4j on my laptop, and creating new nodes through the REST API seems rather slow (~ 30-40 ms on average). I searched Google a bit, but I can’t find real benchmarks for how long it β€œshould”; this post is there , but it only indicates relative performance, not absolute performance. Is neo4j limited to only ~ 30 new nodes per second (out of burst mode) or is there something wrong with my configuration?

Configuration Details:

  • Version Neo4j 2.2.5
  • The server is on my laptop mid-2014, running Ubuntu 15.04
  • OpenJDK version 1.8
  • Calls to the server also from my laptop (through localhost: 7474), so there should be no network delay.
  • I'm calling neo4j via Clojure / Neocons; the used method "creates" in the class clojurewerkz.neocons.rest.nodes
  • Using Cypher seems to be even slower; eg. calling "PROFILE CREATE" (you: Person (name: "Jane Doe")) RETURNS you "via the HTML interface returns" Cypher version: CYPHER 2.2, planner: RULE 5 total db deletes in 54 ms ".
+5
source share
2 answers

Neo4j performance chess is a complex field.

Measurement performance

First of all: it all depends a lot on how the server is configured. Measuring something on a laptop is the wrong way to do it.

To measure performance, you should check the following:

  • You have the appropriate server hardware ( requirements )
  • The client and server are on the local network.
  • Neo4j is configured correctly (memory mapping, web server thread pool, java heap size, etc.)
  • The server is configured correctly (tcp Linux stack, maximum open files available, etc.)
  • The server is warming up. Neo4j is written in Java, so you should do the appropriate workout before measuring the numbers (i.e. do some work for ~ 15 minutes).

And the last one is the corporate edition. The corporate version of Neo4j has some additional features that can significantly improve performance (i.e. HPC cache ).

Neo4j internally

Neo4j internally:

  • Storage
  • Core API
  • Traverse API
  • API Cypher

Everything is done without any additional network requests. The Neo4j server is built on top of this solid foundation.

So, when you make a request to the Neo4j server, you measure:

  • Delay between client and server
  • JSON serialization costs
  • Web Server (Jetty)
  • Additional modules for managing locks, transactions, etc.
  • And Neo4j itself

So, the bottom line here is Neo4j pretty quickly on its own if it is used in native mode. But work with the Neo4j server is associated with additional costs.

The numbers

We had internal testing of Neo4j. We measured several cases.

Create Nodes

Here we use the vanilla Transactional Cypher REST API.

Topics: 2

Node per transaction: 1000 Execution time: 1635 Total nodes created: 7000000 Nodes per second: 7070 

5 topics

 Node per transaction: 750 Execution time: 852 Total nodes created: 7000000 Nodes per second: 8215 

Huge database synchronization

This uses a specially designed unmanaged extension , with a binary protocol between the server and the client and some concurrency.

But this is still a Neo4j server (actually a Neo4j cluster).

 Node count: 80.32M (80 320 000) Relationship count: 80.30M (80 300 000) Property count: 257.78M (257 780 000) Consumed time: 2142 seconds Per second: Nodes - 37497 Relationships - 37488 Properties - 120345 

These numbers show the true power of Neo4j.

My numbers

I tried to measure performance right now

Fresh and unconfigured database (2.2.5), Ubuntu 14.04 (VM).

Results:

 $ ab -p post_loc.txt -T application/json -c 1 -n 10000 http://localhost:7474/db/data/node This is ApacheBench, Version 2.3 <$Revision: 1604373 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests Server Software: Jetty(9.2.4.v20141103) Server Hostname: localhost Server Port: 7474 Document Path: /db/data/node Document Length: 1245 bytes Concurrency Level: 1 Time taken for tests: 14.082 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 14910000 bytes Total body sent: 1460000 HTML transferred: 12450000 bytes Requests per second: 710.13 [#/sec] (mean) Time per request: 1.408 [ms] (mean) Time per request: 1.408 [ms] (mean, across all concurrent requests) Transfer rate: 1033.99 [Kbytes/sec] received 101.25 kb/s sent 1135.24 kb/s total Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.2 0 19 Processing: 1 1 1.3 1 53 Waiting: 0 1 1.2 1 53 Total: 1 1 1.3 1 54 Percentage of the requests served within a certain time (ms) 50% 1 66% 1 75% 1 80% 1 90% 2 95% 2 98% 3 99% 4 100% 54 (longest request) 

This creates 10,000 nodes using the REST API, with no properties in 1 thread.

As you can see, the event on my laptop in Linux VM with the default settings - Neo4j can create nodes in 4 ms or less (99%).

Note. I preheated the database (created and deleted 100K nodes).

Bolt

If you are looking for the best Neo4j performance, you should follow the development of Bolt . This is the new binary protocol for the Neo4j server.

Additional information: here , here and here .

+3
source

Another task is to run ./bin/neo4j-shell . Since there is no HTTP connection, this can help you understand how much Neo4j is and how much of the HTTP interface.

When I do this in 2.2.2, my CREATE usually around 10 ms.

I'm not sure what ideal is, and if there is a configuration that can improve performance.

0
source

Source: https://habr.com/ru/post/1232063/


All Articles