How to improve performance for massive merge insert?

I am trying to insert data from my SQL db into Neo4J. I have a CSV file where each line generates 4-5 entities and some relationships between them. Entities can be duplicated between lines, and I want to create uniqueness.

What am I doing now:

  • create restrictions for each label to create uniqueness.
  • CSV iteration:
    • start a transaction
    • create merge statements for objects
    • create merge operations for relationships
    • make a transaction

I have bad results. Then I tried to complete the transaction every X rows (X was 100, 500, 1000 and 5000). This is better now, but I still have two problems:

  • he is slow. an average of about 1-1.5 seconds per 100 rows. (line = 4-5 entities and 4-5 relationships).
  • This is getting worse as I keep adding data. Usually I start at 400-500 ms per 100 lines, and after ~ 5000 lines I go for ~ 4-5 seconds per 100 lines.

From what I know, my restriction also creates an index for this field. This is the field that is used when I create a new node with MERGE. Any chance he is not using an index?

What is the best practice for increasing productivity? I saw BatchInserter but was not sure if I can use it with MERGE operations.

thank

+3
source share

Source: https://habr.com/ru/post/1535451/


All Articles