Hazelcast and MapDB - Implementing a Simple Distributed Database

Question

Hazelcast and MapDB - Implementing a Simple Distributed Database

I implemented a hazelcast service that stores its data in local mapdb instances through MapStoreFactory and newMapLoader. Thus, keys can be loaded if a cluster restart is required:

public class HCMapStore<V> implements MapStore<String, V> { Map<String, V> map; /** specify the mapdb eg via * DBMaker.newFileDB(new File("mapdb")).closeOnJvmShutdown().make() */ public HCMapStore(DB db) { this.db = db; this.map = db.createHashMap("someMapName").<String, Object>makeOrGet(); } // some other store methods are omitted @Override public void delete(String k) { logger.info("delete, " + k); map.remove(k); db.commit(); } // MapLoader methods @Override public V load(String key) { logger.info("load, " + key); return map.get(key); } @Override public Set<String> loadAllKeys() { logger.info("loadAllKeys"); return map.keySet(); } @Override public Map<String, V> loadAll(Collection<String> keys) { logger.info("loadAll, " + keys); Map<String, V> partialMap = new HashMap<>(); for (String k : keys) { partialMap.put(k, map.get(k)); } return partialMap; }}

The problem I'm currently facing is that the loadAllKeys method of the MapLoader interface from hazelcast requires ALL keys of the entire cluster to be returned, but each node ONLY stores the objects that belong to it.

Example: I have two nodes and 8 objects are stored, and then, for example, 5 objects are stored in mapdb node1 and 3 in mapdb node2. I think which of the objects belongs to node. Now when restarting, node1 will return 5 keys for loadAllKeys, and node2 will return. 3. Hazelcast decides to ignore 3 elements, and the data is "lost."

What could be a good solution to this?

Update for the bounty : Here I asked about it on the hc mailing list, which lists 2 options (I will add 1 more), and I would like to know if something like this is possible with hazelcast 3.2 or 3.3:

Currently, the MapStore interface only retrieves data or updates from the local node. Will it be possible to notify the MapStore interface of each storage action of the entire cluster? Or perhaps this is already possible with some listener magic? Perhaps I can make hazelcast put all the objects in one partition and have 1 copy on each node.
If I restart, for example. 2, then the MapStore interface is correctly called with my local databases for node1, and then for node2. But when both nodes join the data of node2, they will be deleted, since Hazelcast assumes that only the node wizard can be right. Can I teach hazelcast to accept data from both nodes?

+5

java nosql hazelcast mapdb

Karussell Sep 2 '14 at 11:03

source share

4 answers

Vlad · Answer 1 · 2014-09-09T09:28:18+0000

According to the Hazelcast 3.3 documentation, the MapLoader initialization flow is as follows:

When getMap () is first called from any node, initialization will begin depending on the value of InitialLoadMode. If it is set as EAGER, initialization begins. If it is set to LAZY, initialization does not actually start, but the data is loaded every time the partition is finished loading.
Hazelcast will call MapLoader.loadAllKeys () to get all your keys every node
Each node will define a list of keys that it owns.
Each node will load all keys belonging to it, calling MapLoader.loadAll (keys)
Each node puts its owned records on the map, calling IMap.putTransient (key, value)

It follows from the above that if the nodes are launched in a different order, then the keys will be distributed in different ways. Thus, each node will not find all / some of the assigned keys in its local storage. You should be able to verify this by setting breakpoints in your HCMapStore.loadAllKeys and HCMapStore.loadAll and compare the keys you get with the keys that are.

In my opinion, what you are trying to achieve contradicts the concept of a distributed cache with stability characteristics such as Hazelcast, and therefore impossible. That is, when one of the nodes leaves (refuses or disconnects for some reason), the cluster will rebalance, moving pieces of data around, the same process will occur every time the node joins the cluster. Thus, in the event of a cluster change, the local backstore of the lost node becomes obsolete.

The Hazelcast cluster is dynamic in nature, so it cannot rely on a backstore with a static distributed topology. Essentially, you need to have a common backstore to make it work with a dynamic hazel cluster. You can also distribute the back store, for example. cassandra, but its topology must be independent of the cache cluster topology.

UPDATE: It seems to me that what you are trying to achieve is more logical in the form of a distributed data store (on top of MapDB) with local caching.

Hope this helps.

Jan Kotek · Answer 2 · 2014-09-09T08:08:58+0000

Two options are possible:

1) Tell me how partitioning works in Hazelcast. I think there may be a way to have a MapLoader for each section and make node load only its own sections, this will resolve conflicts.

2) when the node returns interactively, interacts with the Hazelcast cluster before adding the node. You can combine two sets from HZ second from MapDB.

3) forces Hazelcast to store all data on each node. Set the partition number to 1 or something

Andrejs · Answer 3 · 2014-09-11T12:10:26+0000

You can download data stored on all nodes, but for now you will have to do it manually.

In each node:

 HCMapStore store = createMapDbStore(); HazelcastInstance hz = createHz( store ); // use store in MapStoreConfig as implementation IMap imap = hz.getMap("map"); Map diskMap = store.loadAll( store.loadAllKeys() ); // load all entries on disk imap.putAll( diskMap ); // put into distributed map

But as mentioned on the MapStore mailing MapStore , it is not really intended to be used that way. Also keep in mind that backups are not saved to disk in this way. Therefore, if you reboot your cluster and disk on one of the node, these records will be lost.

Karussell · Answer 4 · 2014-09-02T14:54:38+0000

It seems not easy :

The resistance level for Hazelcast requires it to be something like a central repository. Like a database or a shared file.

or look here or here . Take a look at OrientDB, which uses Hazelcast and stays on disk.

Hazelcast and MapDB - Implementing a Simple Distributed Database

More articles: