Error starting multiple workflows in OOZIE-4.1.0

I installed oozie 4.1.0 on a Linux machine by following these steps: http://gauravkohli.com/2014/08/26/apache-oozie-installation-on-hadoop-2-4-1/

hadoop version - 2.6.0 maven - 3.0.4 pig - 0.12.0 

Cluster Setup -

MASTER NODE runnig - Namenode, Resourcemanager, proxyserver.

SLAVE NODE works -Datanode, Nodemanager.

When I start one workflow job, it means it completed successfully. But when I try to run more than one Workflow job, that is, both jobs are in an authorized state enter image description here

By checking the error log, I deploy the problem as

 014-12-24 21:00:36,758 [JobControl] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 172.16.***.***/172.16.***.***:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-12-25 09:30:39,145 [communication thread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 172.16.***.***/172.16.***.***:52406. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-12-25 09:30:39,199 [communication thread] INFO org.apache.hadoop.mapred.Task - Communication exception: java.io.IOException: Failed on local exception: java.net.SocketException: Network is unreachable: no further information; Host Details : local host is: "SystemName/127.0.0.1"; destination host is: "172.16.***.***":52406; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231) at $Proxy9.ping(Unknown Source) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:742) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.SocketException: Network is unreachable: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463) at org.apache.hadoop.ipc.Client.call(Client.java:1382) ... 5 more Heart beat Heart beat . . 

In the above work tasks, if I kill any manual start task (hadoop job -kill <launcher-job-id>) means that all tasks are completed successfully. Thus, I think that the problem is being performed by more than one starting job, meaning that the job will correspond to a deadlock ..

If anyone knows the reason and solution to the problem above. Please do me a favor as soon as possible.

+6
source share
2 answers

Queue problem. When we run Job in SAME QUEUE (DEFAULT) with the cluster setting specified above, the Resourcemanager is responsible for running the mapreduce job in the salve node. Due to a lack of resources in the slave node, work in the queue will meet a deadlock situation.

To overcome this problem, we need to split the Mapreduce job by running the mapreduce job in a different queue .

enter image description here

you can do this by installing this piece of pig action inside your oozie workflow.xml

 <configuration> <property> <name>mapreduce.job.queuename</name> <value>launcher2</value> </property> 

NOTE: This solution is for SMALL CLUSTER SETUP only.

+1
source

I tried the solution below, it works fine for me.

1) Change the Hadoop schedule type from Performance Scheduler to Fair Scheduler . Since for a small cluster, each queue assigns a certain amount of memory (2048 MB) to complete the work on reducing a single card. If more than one card reduces job execution in a single queue, it encounters a dead end .

Solution : add property below in yarn-site.xml

  <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.scheduler.fair.allocation.file</name> <value>file:/%HADOOP_HOME%/etc/hadoop/fair-scheduler.xml</value> </property> 

2) By default, the Hadoop Total memory is 8 GB.

So, if we run the two mapreduce program memories used by Hadoop, we get more than 8 GB to meet the deadlock .

Solution . Increase the size of the shared nodemanager memory using the following properties: yarn-site.xml

 <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>20960</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property> 

So, if a user tries to run more than two mapreduce programs, he needs to increase the nodemanager or he needs to increase the size of the Hadoop shared memory (note: increasing the size will decrease the memory of the system. Run 10 cards, reduce the program at the same time.)

+2
source

Source: https://habr.com/ru/post/980162/


All Articles