Hadoop Ha namenode java client

I am new to hdf. I am writing a Java client that can connect and write data to a hadoop remote cluster.

String hdfsUrl = "hdfs://xxx.xxx.xxx.xxx:8020"; FileSystem fs = FileSystem.get(hdfsUrl , conf); 

It works great. My problem is how to handle the HA-enabled hadoop cluster. HA enabled hasoop cluster will have two names - one active name and backup name. How can I determine the active namenode from my client code at runtime.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.1/bk_system-admin-guide/content/ch_hadoop-ha-3-1.html contains the following information about the java class that you can use to contact with active guides dfs.client.failover.proxy.provider. [$ nameservice ID]:

 This property specifies the Java class that HDFS clients use to contact the Active NameNode. DFS Client uses this Java class to determine which NameNode is the current Active and therefore which NameNode is currently serving client requests. Use the ConfiguredFailoverProxyProvider implementation if you are not using a custom implementation. 

For instance:

 <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> 

How can I use this class in my java client or is there any other way to identify the active namenode ...

+6
source share
2 answers

Not sure if this is the same context, but given the hadoop cluster, put core-site.xml (taken from the cluster) in the application class path or in the hadoop configuration object (org.apache.hadoop.conf.Configuration) , and then get access to this file with the URL "hdfs://mycluster/path/to/file" where mycluster is the name of the hadoop cluster. Like me, I successfully read the file from the hadoop cluster in the spark application.

0
source

Your client should have hadoop hdfs-site.xml cluster, as it will contain the name service used for both namenodes, and information about the hostname namenodes, port for connection, etc.

You should set these settings in your client as indicated in the answer ( fooobar.com/questions/10276734 / ... ):

 "dfs.nameservices", "hadooptest" "dfs.client.failover.proxy.provider.hadooptest" , "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" "dfs.ha.namenodes.hadooptest", "nn1,nn2" "dfs.namenode.rpc-address.hadooptest.nn1", "10.10.14.81:8020" "dfs.namenode.rpc-address.hadooptest.nn2", "10.10.14.82:8020" 

Thus, your client will use the class "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" to find which namenode is active and, accordingly, redirect the request to this namenode. First, it tries to connect to the first URI, and in case of failure, the second.

https://blog.woopi.org/wordpress/files/hadoop-2.6.0-javadoc/org/apache/hadoop/hdfs/server/namenode/ha/ConfiguredFailoverProxyProvider.html

enter image description here

0
source

Source: https://habr.com/ru/post/978037/


All Articles