Receiving "Out of space on device" approx. 10 GB EMR data m1.large instances

I get an “Out of space on device” error when I run my Amazon EMR jobs using m1.large as the instance type for hadoop instances that should be created by the job thread. The task generates approx. 10 GB of data at max, and since the capacity of the m1.large instance should be 420 GB * 2 (according to: types of EC2 instances ). I am confused as soon as 10 GB of data can lead to the appearance of "full disk space". I know that such an error can be generated if we have completely exhausted the total number of inodes allowed for the file system, but this seems like a large number of millions, and I am sure that my work is not producing many files. I saw that when I try to create an EC2 instance, regardless of type m1.large, it defaults to giving it an 8 GB root volume. Could this be the reason for the allocation of instances in EMR? Then, when do 420 GB drives get into the instance?

In addition, the output of "df -hi" and "mount" is output here.

  $ df -hi
 Filesystem Inodes IUsed IFree IUse% Mounted on
 / dev / xvda1 640K 100K 541K 16% /
 tmpfs 932K 3 932K 1% / lib / init / rw
 udev 930K 454 929K 1% / dev
 tmpfs 932K 3 932K 1% / dev / shm
 ip-10-182-182-151.ec2.internal: / mapr
                         100G 50G 50G 50% / mapr

 $ mount
 / dev / xvda1 on / type ext3 (rw, noatime)
 tmpfs on / lib / init / rw type tmpfs (rw, nosuid, mode = 0755)
 proc on / proc type proc (rw, noexec, nosuid, nodev)
 sysfs on / sys type sysfs (rw, noexec, nosuid, nodev)
 udev on / dev type tmpfs (rw, mode = 0755)
 tmpfs on / dev / shm type tmpfs (rw, nosuid, nodev)
 devpts on / dev / pts type devpts (rw, noexec, nosuid, gid = 5, mode = 620)
 / var / run on / run type none (rw, bind)
 / var / lock on / run / lock type none (rw, bind)
 / dev / shm on / run / shm type none (rw, bind)
 rpc_pipefs on / var / lib / nfs / rpc_pipefs type rpc_pipefs (rw)
 ip-10-182-182-151.ec2.internal: / mapr on / mapr type nfs (rw, addr = 10.182.182.151)
 $ lsblk
 NAME MAJ: MIN RM SIZE RO TYPE MOUNTPOINT
 xvda1 202: 1 0 10G 0 disk /
 xvdb 202: 16 0 420G 0 disk 
 xvdc 202: 32 0 420G 0 disk

+6
source share
1 answer

Using @slayedbylucifer, I was able to determine that the problem is that full disk space is becoming available for HDFS in the default cluster. Therefore, there is 10 GB of default space installed / available for local use by the machine. There is an option called --mfs-percentage , which can be used (when using the Hadoop MapR distribution) to specify the separation of disk space between the local file system and HDFS. It sets the local file system quota to /var/tmp . Make sure the mapred.local.dir parameter mapred.local.dir set to the directory inside /var/tmp , because this is where all the tasktracker attempt logs occur, the size of which can be huge for large jobs. The record in my case caused an error on the disk. I set the --mfs-percentage value to 60 and was able to successfully complete the task after that.

+2
source

Source: https://habr.com/ru/post/956622/


All Articles