HBase HDFS zookeeper

Now I will learn about HBase. I created an HBase cluster and a Hadoop Cluster as follows:

server1: Namenode HMaster server2: datanode1 RegionServer1 HQuorumPeer Server3: datanode2 RegionServer2 HQuorumPeer Server4: datanode3 RegionServer3 HQuorumPeer 

I have a few questions about the HBase cluster:

 1: All RegionServers must be in the Hadoop Cluster so it can use HDFS to store data, even though it will store data into local file system, right? 2: What does RegionServer do? Does the HMaster give the job to all RegionServeres and let them running parallel, like tasktracker in datanode? 3: What does zookeeper do? Do I need to setup zookeeper in all RegionServers nodes and the master node? 4: It is related to #3. I know HBase uses zookeeper to recovery once regionServer is down. How does it specific work? 
+4
source share
1 answer

All RegionServers must be in a Hadoop cluster so that it can use HDFS to store data, although it will store data on the local file system, right?

Yes. RegionServers are daemons that are responsible for storing data in an HBase cluster. You store data in HBase tables that are distributed across many regions across multiple RegionServers across the cluster. Although the data goes to RegionServers, it is actually stored inside HDFS. But if you use offline configuration, HDFS is not used. Data is stored directly in the local FS. It is similar to any database and file system. For example, take MSQL and ext3. And yes, all HDFS data is stored on your disk in reality. You cannot see it directly, though.

What does RegionServer do? Does HMaster provide work for all RegionServeres and let them work in parallel, like tasktracker in datanode?

As stated in the comment above, RegionServer is a daemon that actually stores data in an HBase cluster. Sorry, I did not quite understand the second part of this question. what do you mean as tasktracker in datanode ? In a cluster, HBase HMaster is a daemon that monitors all RegionServer instances in the cluster and is the interface for all metadata changes. Its task is monitoring and management. Regionservers do not do any tasks like TaskTrackers. They simply store data and are responsible for things such as service and regional management.

What does a zookeeper do? Do I need to configure a zookeeper in all RegionServers nodes and a node master?

Zookeeper is the guy who coordinates everything behind the curtains. It is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. The distributed HBase configuration depends on the running ZooKeeper cluster. All participating nodes and clients must have access to the running ZooKeeper ensemble. By default, HBase manages the ZooKeeper cluster. It starts and stops as part of the HBase start / stop process. But you can also manage the ZooKeeper ensemble independently of HBase and simply specify HBase in the cluster that it should use. You do not need Zookeepers to work on all nodes. Just decide how much is right for your cluster. It should be noted here that you should always use the number of odd zookeepers.

Associated with C # 3. I know that HBase uses a zookeeper to recover after a regionServer does not work. How it works?

Each RegionServer is connected to ZooKeeper, and the wizard monitors these connections. ZooKeeper controls the heart rate with a timeout. Thus, after a period of time, HMaster declares the region server dead and starts the recovery process. The following events occur during the recovery process:

  • Identify that a node is not working: a node may stop responding simply because it is overloaded or also because it is dead.
  • Restoring incomplete operations: that reads the commit log and restores changes that have not been cleared.
  • Remapping regions: the region server previously handled many regions. This set should be redistributed to other servers in the regions, depending on their respective workload.

The process is actually a bit involved. Read more about it here . I would also suggest you check out Lars' HBase The Definitive Guide to access HBase.

NTN

+15
source

Source: https://habr.com/ru/post/1501560/


All Articles