I am trying to configure MongoDB in a replica configuration to see how it scales / performs / handles.
I used Morphia (a POJO mapping layer on top of Mongo Java drivers ) to save 10,000 simple random documents into one collection. I annotated my POJO ( MyData in the snippet below) with the @Entity(concern="REPLICAS_SAFE") annotation @Entity(concern="REPLICAS_SAFE") in the hope that the data sent to the database will be saved.
My POJO consisted of an ObjectId field (Mongo primary key), a String random characters of random length (maximum 20 characters) and long , generated using Random.nextLong() .
My code is as follows:
for (int i=0;i<10000;i++) { final MyData data = new MyData(); boolean written = false; do { try { ds.save(data); //ds is of type DataStore written=true; } catch (Exception e) { continue; } } while (!written); }
I installed a replica cluster with four nodes, ran the above program, and then began to metaphorically pull the cables to find out what happened.
The desired result was the work of the program until it deleted all the documents in the database.
The actual result, after a few minutes, was one of:
- It was reported that he transferred 10 thousand records, but the database has only <10k
- Java report that it passed <10k, and a database reporting either the same value or even less
- Everything is working fine
In one case, the nodes that were returned could not actually catch up with the PRIMARY node and had to be started from scratch with a remote database. This was despite the increase in the opfile parameter to 2 concerts, which, it seemed to me, would be enough to reproduce 10,000 lines of very simple data.
Other things you should know:
- All this works on the same hardware (2 gigabytes of Pentium D!) With a cluster running on two 32-bit instances of the Ubuntu Server VirtualBox with 128 megabytes of memory each, and a Java client running inside the Windows XP host. Two
mongod processes mongod executed on each virtual machine, plus arbiter worked on one virtual machine. - The clock on the two virtualized machines was turned off for a few seconds (I need to install the VirtualBox guest add-ons to fix this), but not a big sum - 10gen says that time should not be a problem for clustering, but I Think I would say that.
I am aware of the limitation of 2 gigabytes with Mongo on a 32-bit machine, that other people lost their records, and I know that the machine with which I do these tests is not quite in the Top 500 (therefore, the data I chose for the save was small), but when my tests worked, they worked very well.
Am I having trouble proving that Mongo is not ready for prime time yet, or am I doing something inherently wrong?
I am using 1.6.5.
Any ideas, tips, tricks, pointers, explanations or criticism are greatly appreciated!
ps: I'm not trolling - I really like the NoSQL idea for the data for which it is good, so I really want it to work, but so far I'm not very lucky!