Processing large data sets (neo4j, mongo db, hadoop)

Question

Processing large data sets (neo4j, mongo db, hadoop)

I am looking for the best data processing methods. So, this is what I got so far: 1,000,000 nodes of type "A". Each "A" node can be connected to 1-1000 nodes of type "B" and 1-10 nodes of type "C".

I wrote a RESTful service (Java, Jersey) to import data into a neo4j chart. After importing nodes "A" (only nodes with identifiers, no additional data), I notice that neo4j db has grown to ~ 2.4 GB.

Is it good to store additional fields (name, description, ...) in neo4j? Or do I need to configure mongoDB / hadoop to use a combination of keys / values to access data?

+4

mongodb neo4j hadoop

Alebon Nov 14 '11 at 8:10

source share

1 answer

Michael hunger · Accepted Answer · 2011-11-14T09:05:47+0000

Did you delete many nodes during insertion? Usually node takes up 9 bytes on disk, so your 1M nodes should only accept 9M bytes. You must enable reuse of the identifier to aggressively reclaim memory.

Could you list the contents of your data directory with file sizes?

In general, you should not put your other fields in neo4j if they are not large blob blocks.

How did you create db?

Processing large data sets (neo4j, mongo db, hadoop)

More articles: