Should the HBase realm server and Hadoop node data on the same computer?

Question

Should the HBase realm server and Hadoop node data on the same computer?

Sorry that I do not have the resource to configure the cluster to check it, I'm just interested to know:

Is it possible to deploy the hbase domain server on a separate machine other than the hadoop node machine? I think the answer is yes, but I'm not sure.
Is it good or bad to deploy the hbase domain server and hadoop node data on different machines?
When placing some data in hbase, where is this data, node data or a region server stored? I think this is node data, but what is StoreFile and HFile on a region server, is it not a physical file to store our data?

Thanks!

+6

hbase hadoop

morven Jan 6 '15 at 10:24

source share

2 answers

More on this answer :

Should regional services always work together? DataNodes in distributed clusters if you want decent performance. "

I'm not sure how anyone will interpret this term nearby, so try to be more specific:

What makes any physical server an XYZ server is that it runs a program called daemons (think of a program that runs forever for background processing);
What the file server does is that it works with the daemon file;
What the web server does is that it starts the daemon web service; AND
What makes the "data node" server so that it runs a demo file HDFS.
What makes the "region" server then consists in the fact that it works with the HBase support server (program);

So, in all Hadoop distributions (for example, Cloudera, MAPR, Hortonworks, others), the common best practice is that for HBase, "RegionServers" are "shared" with "DataNodeServers".

This means that on the real slave servers (datanode) that form the HDFS cluster, each of them works with the daemon HDFS service (program) and , they also work with the HBase support area (program)!

Thus, we provide locality - parallel processing and storage of data on all individual nodes in the HDFS cluster with no "movement" of giant loads of large data from storage locations to "processing" locations. Locality is vital to the success of the Hadoop cluster, so the HBase domain servers (the data nodes that the HBase daemon runs on) must also do all their processing (put / getting / scan) on the node data containing the HFiles that make up the HPegions that make up The HTables that make up HBases (Hadoop-dataBases) ....

Thus, servers (virtual machines or physical ones in Windows, Linux, ..) can run several daemons at the same time, often they run dozens of them regularly.

0

Mark vogt Mar 16 '17 at 1:16

source share

Rubén moraleda · Accepted Answer · 2015-01-06T11:02:22+0000

RegionServers should always work with DataNodes in distributed clusters if you need decent performance.
It is very bad that it will work against the principle of data locality (if you want to know a little more about data locality, check this out: http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html )
Actual data will be stored in HDFS (DataNode), RegionServers are responsible for maintaining and managing regions.

For more information on HBase architecture, please check out this great Lars blog post: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

By the way, if you have a PC with decent memory, you can create a demo cluster with virtual machines. Never try to set up a production environment without first testing the platform in a development environment.

Should the HBase realm server and Hadoop node data on the same computer?

More articles: