Minimum System Requirements for a High Availability Hadoop Cluster

From what I understand for high availability in hadoop, we need one Node name and one backup Node, shared network storage space (shared between two name nodes), at least 2 data nodes to start a cluster cluster.

  • Is it possible to start the dataNode server on the same computer on which the node name is running.

  • It can be launched on the machine running the NameNode or dataNode server.

Please suggest if I am missing any other service needed to create a hadoop environment.

What should be the system requirements for the Node name, since it only processes metadata (CPU I / O Intensive). The data we are crunching is mainly related to I / O intensity.

+4
source share
1 answer

For Hadoop HA, you need at least two separate machines that can run Namenode and Namenode HA. So, in theory, you can have a Hadoop HA cluster with two machines. But this is not very useful in practical terms.

To answer another question: 1. You can start the DataNode service on a machine that starts the Namenode service. This is a common scenario in a PoC cluster where you have a small cluster (approximately 3-7nodes) NOTE: You should use specialized machines for master services, such as Namenode, as part of best practices.

  1. , YARN , Datanode Namenode , . , node . , , Namenode, Datanode, YARN, Java-, JVM. node node .

, , . (CPU I/O ) namenode.

.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

+3

Source: https://habr.com/ru/post/1608777/


All Articles