Using Apache Pig version 0.10.1.21 (reported), CentOS release 6.3 (Final), jdk1.6.0_31 (The Hortonworks Sandbox v1.2 on Virtualbox with 3.5 GB of RAM)
$ cat data.txt 11,11,22 33,34,35 47,0,21 33,6,51 56,6,11 11,25,67 $ cat GrpTest.pig A = LOAD 'data.txt' USING PigStorage(',') AS (f1:int,f2:int,f3:int); B = GROUP A BY f1; DESCRIBE B; DUMP B;
pig -x local GrpTest.pig
[Thread-12] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). [Thread-12] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 [Thread-13] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@19a9bea3 [Thread-13] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B
The java.lang.OutOfMemoryError: Java heap space error occurs every time I use GROUP or JOIN in a swing script executed in local mode. There is no error when the script is executed in mapreduce mode on HDFS.
Question 1 : How does the OutOfMemory error occur when the data sample is insignificant and the local mode should use less resources than the HDFS mode?
Question 2 : Is there a solution for successfully running small pig scripts with GROUP or JOIN in local mode?
source share