Any command to get active namenode for nameservice in hadoop?

Command:

hdfs haadmin -getServiceState machine-98 

It only works if you know the name of the machine. Is there any command like:

 hdfs haadmin -getServiceState <nameservice> 

which can tell you the IP / hostname of the active namenode?

+10
source share
9 answers

To print the name guides, use this command:

 hdfs getconf -namenodes 

To print secondary names:

 hdfs getconf -secondaryNameNodes 

To print backup names:

 hdfs getconf -backupNodes 

Note. These commands have been tested using Hadoop 2.4.0.

Update 10-31-2014:

Here is a python script that will read the NameNodes participating in Hadoop HA from the configuration file and determine which one is active using the hdfs haadmin command. This script has not been fully tested since I do not have HA setup. Checked only parsing using a sample file based on the Hadoop HA documentation. Feel free to use and modify as needed.

 #!/usr/bin/env python # coding: UTF-8 import xml.etree.ElementTree as ET import subprocess as SP if __name__ == "__main__": hdfsSiteConfigFile = "/etc/hadoop/conf/hdfs-site.xml" tree = ET.parse(hdfsSiteConfigFile) root = tree.getroot() hasHadoopHAElement = False activeNameNode = None for property in root: if "dfs.ha.namenodes" in property.find("name").text: hasHadoopHAElement = True nameserviceId = property.find("name").text[len("dfs.ha.namenodes")+1:] nameNodes = property.find("value").text.split(",") for node in nameNodes: #get the namenode machine address then check if it is active node for n in root: prefix = "dfs.namenode.rpc-address." + nameserviceId + "." elementText = n.find("name").text if prefix in elementText: nodeAddress = n.find("value").text.split(":")[0] args = ["hdfs haadmin -getServiceState " + node] p = SP.Popen(args, shell=True, stdout=SP.PIPE, stderr=SP.PIPE) for line in p.stdout.readlines(): if "active" in line.lower(): print "Active NameNode: " + node break; for err in p.stderr.readlines(): print "Error executing Hadoop HA command: ",err break if not hasHadoopHAElement: print "Hadoop High-Availability configuration not found!" 
+21
source

Found the following:

https://gist.github.com/cnauroth/7ff52e9f80e7d856ddb3

This works out of the box on my CDH5 namenodes, although I'm not sure if other hadoop distributions will have http: // namenode: 50070 / jmx - if not, I think it can be added by deploying Jolokia .

Example:

 curl 'http://namenode1.example.com:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus' { "beans" : [ { "name" : "Hadoop:service=NameNode,name=NameNodeStatus", "modelerType" : "org.apache.hadoop.hdfs.server.namenode.NameNode", "State" : "active", "NNRole" : "NameNode", "HostAndPort" : "namenode1.example.com:8020", "SecurityEnabled" : true, "LastHATransitionTime" : 1436283324548 } ] 

Thus, choosing one HTTP request for each namenode (this should be fast), we can find out which one is active.

It's also worth noting that if you say the WebHDFS REST API for an inactive namenode, you will get 403 Forbidden and the following JSON:

 {"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation category READ is not supported in state standby"}} 
+14
source

You can do this in bash using hdfs cli calls. With a warning warning that this takes a little longer since it takes several calls to the API sequentially, but it might be preferable to use a python script for some.

This has been tested with Hadoop 2.6.0

 get_active_nn(){ ha_name=$1 #Needs the NameServiceID ha_ns_nodes=$(hdfs getconf -confKey dfs.ha.namenodes.${ha_name}) active="" for node in $(echo ${ha_ns_nodes//,/ }); do state=$(hdfs haadmin -getServiceState $node) if [ "$state" == "active" ]; then active=$(hdfs getconf -confKey dfs.namenode.rpc-address.${ha_name}.${node}) break fi done if [ -z "$active" ]; then >&2 echo "ERROR: no active namenode found for ${ha_name}" exit 1 else echo $active fi } 
+4
source

After reading all the existing answers, no one seemed to combine the three steps:

  • Identification of names from the cluster.
  • Host name resolution: host.
  • Checking the status of each node (without the need for admin privs cluster).

The solution below combines the hdfs getconf calls and the JMX service call for node status.

 #!/usr/bin/env python from subprocess import check_output import urllib, json, sys def get_name_nodes(clusterName): ha_ns_nodes=check_output(['hdfs', 'getconf', '-confKey', 'dfs.ha.namenodes.' + clusterName]) nodes = ha_ns_nodes.strip().split(',') nodeHosts = [] for n in nodes: nodeHosts.append(get_node_hostport(clusterName, n)) return nodeHosts def get_node_hostport(clusterName, nodename): hostPort=check_output( ['hdfs','getconf','-confKey', 'dfs.namenode.rpc-address.{0}.{1}'.format(clusterName, nodename)]) return hostPort.strip() def is_node_active(nn): jmxPort = 50070 host, port = nn.split(':') url = "http://{0}:{1}/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus".format( host, jmxPort) nnstatus = urllib.urlopen(url) parsed = json.load(nnstatus) return parsed.get('beans', [{}])[0].get('State', '') == 'active' def get_active_namenode(clusterName): for n in get_name_nodes(clusterName): if is_node_active(n): return n clusterName = (sys.argv[1] if len(sys.argv) > 1 else None) if not clusterName: raise Exception("Specify cluster name.") print 'Cluster: {0}'.format(clusterName) print "Nodes: {0}".format(get_name_nodes(clusterName)) print "Active Name Node: {0}".format(get_active_namenode(clusterName)) 
+2
source

The Hadoop High Availability cluster will have two names - one active and one backup.

To find the active namenode, we can try to run the hdfs test command on each of the namenodes and find the active name node corresponding to a successful run.

The command below succeeds if the node name is active and does not work if it is in node standby mode.

 hadoop fs -test -e hdfs://<Name node>/ 

Unix script

 active_node='' if hadoop fs -test -e hdfs://<NameNode-1>/ ; then active_node='<NameNode-1>' elif hadoop fs -test -e hdfs://<NameNode-2>/ ; then active_node='<NameNode-2>' fi echo "Active Dev Name node : $active_node" 
+2
source

From the Java API, you can use HAUtil.getAddressOfActive(fileSystem) .

+2
source

You can run the curl command to find out the active and secondary Namenode for example

curl -u username -H "X-Requested-By: ambari" -X GET http: // cluster-hostname: 8080 / api / v1 / clusters / / services / HDFS

Hello

+1
source
 #!/usr/bin/python import subprocess import sys import os, errno def getActiveNameNode () : cmd_string="hdfs getconf -namenodes" process = subprocess.Popen(cmd_string, shell=True, stdout=subprocess.PIPE) out, err = process.communicate() NameNodes = out Value = NameNodes.split(" ") for val in Value : cmd_str="hadoop fs -test -e hdfs://"+val process = subprocess.Popen(cmd_str, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = process.communicate() if (err != "") : return val def main(): out = getActiveNameNode() print(out) if __name__ == '__main__': main() 
0
source

I found below when I just typed 'hdfs' and found a couple of useful commands that may be useful to those who may come here looking for help.

 hdfs getconf -namenodes 

This command above will give you the service identifier for namenode. Say hn1.hadoop.com

 hdfs getconf -secondaryNameNodes 

This command, above, will provide you with a service identifier for available secondary names. Say hn2.hadoop.com

 hdfs getconf -backupNodes 

This command, above, will provide you with the backup node service identifier, if any.

 hdfs getconf -nnRpcAddresses 

This command will give you information about the name service identifier along with the rpc port number. Say hn1.hadoop.com:8020

  You're Welcome :) 
0
source

Source: https://habr.com/ru/post/977464/


All Articles