Hadoop "Failure to reset" Exception in ec2 instance with 420 GB instance storage

Question

Hadoop "Failure to reset" Exception in ec2 instance with 420 GB instance storage

I use Hadoop2.3.0 and installed it as a single cluster node (psuedo-distributed mode) on a CentOS 6.4 Amazon ec2 instance with instance storage of 420 GB and 7.5 GB of RAM, I understand that the “Failed spill” exception only occurs. when node runs out of disk space, however, after performing map / reduce tasks for a short period of time (no, where about 420 GB of data), I get the following exception.

I would like to mention that I moved the installation of Hadoop to the same node from the 8 GB EBS volume (where I installed it initially) to the storage volume of 420 GB instances on the same node and changed $ HADOOP_HOME and other properties to indicate instance storage capacity, respectively, and Hadoop2.3.0 is now fully contained in the 420 GB drive.

However, I still see the following exception: can you tell me if there is anything other than Disk space that might throw a Spill Failed exception?

2014-02-28 15:35:07,630 ERROR [IPC Server handler 12 on 58189] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1393591821307_0013_m_000000_0 - exited : 
java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)


2014-02-28 15:35:07,604 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Spill failed
2014-02-28 15:35:07,605 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.checkSpillException(MapTask.java:1533)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1442)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_1393591821307_0013_m_000000_0_spill_26.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:402)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
    at org.apache.hadoop.mapred.YarnOutputFiles.getSpillFileForWrite(YarnOutputFiles.java:159)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1564)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$900(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1503)

+4

java amazon-ec2 hadoop yarn

user1965449 Mar 01 '14 at 0:03

source share

1 answer

user1965449 · Accepted Answer · 2014-03-01T21:08:33+0000

I managed to solve this problem by setting the hadoop.tmp.dir value to something in the instace repository, by default it was pointing to the root volume with EBS support.

Hadoop "Failure to reset" Exception in ec2 instance with 420 GB instance storage

More articles: