Sharding vs DFS

As far as I understand, fragments (for example, in MongoDB) and distributed file systems (for example, HDFS in HBase or HyperTable) are different mechanisms that databases use for scaling, but I wonder how they compare?

+6
source share
1 answer

The traditional outline involves breaking the tables into a small number of pieces and executing each part (or โ€œshardโ€) in a separate database on a separate machine. Due to the large size of the fragments, this mechanism may be subject to imbalances due to hot spots and uneven growth, as evidenced by the Foursquare incident . In addition, since each shard is launched on a separate machine, these systems may experience readiness problems if one of the machines drops. To mitigate this problem, most fragment systems, including MongoDB, implement replica groups. Each machine is replaced by a set of three machines in the master and two slaves configurations. Thus, if the machine is lowered, there are two remaining copies to serve the data. There are a couple of problems with this design: first, if the replica is not running in the replica group, and the group is left with only two members to return the replication count to three, the data on one of these two machines must be cloned. Since there are only two machines in the entire cluster that can be used to recreate the replica, one of these two machines will be huge , while replication occurs, which leads to serious performance problems on the fragment in question (for copying 1 TB of a gigabit link takes two hours ). The second problem is that when one of the replicas goes down, it must be replaced with a new machine. Even if there is a lot of spare capacity in the cluster to solve the replication problem, this reserve capacity cannot be used to correct the situation. The only way to solve this problem is to replace the machine. This becomes very difficult from an operational point of view, as cluster sizes grow by hundreds or thousands of machines.

The Bigtable + GFS project solves these problems. Firstly, these tables are divided into finer-grained โ€œtabletsโ€. A typical machine in a Bigtable cluster will often have more than 500 tablets. If an imbalance occurs, then resolving it is simply a matter of transferring a small number of tables from one machine to another. If the TabletServer shuts down because the data set is split and replicated with such fine-grained granularity, there may be hundreds of computers that are involved in the recovery process, which distributes the recovery load and speeds up the recovery time. In addition, since the data is not tied to a specific machine or machines, the reserve capacity on all machines in the cluster can be applied to a failure. No operational requirements are required to replace the machine, since any of the spare capacity across the cluster can be used to eliminate replication imbalance.

  • Doug Judd CEO, Hypertable Inc.
+16
source

Source: https://habr.com/ru/post/894427/


All Articles