Processing large data sets (neo4j, mongo db, hadoop)

I am looking for the best data processing methods. So, this is what I got so far: 1,000,000 nodes of type "A". Each "A" node can be connected to 1-1000 nodes of type "B" and 1-10 nodes of type "C".

I wrote a RESTful service (Java, Jersey) to import data into a neo4j chart. After importing nodes "A" (only nodes with identifiers, no additional data), I notice that neo4j db has grown to ~ 2.4 GB.

Is it good to store additional fields (name, description, ...) in neo4j? Or do I need to configure mongoDB / hadoop to use a combination of keys / values ​​to access data?

+4
source share
1 answer

Did you delete many nodes during insertion? Usually node takes up 9 bytes on disk, so your 1M nodes should only accept 9M bytes. You must enable reuse of the identifier to aggressively reclaim memory.

Could you list the contents of your data directory with file sizes?

In general, you should not put your other fields in neo4j if they are not large blob blocks.

How did you create db?

+2
source

Source: https://habr.com/ru/post/1381082/


All Articles