Records proactively spilled into the Hadoop Pig?

Question

Records proactively spilled into the Hadoop Pig?

I am new to Hadoop and was interested in command line messages from my pig script.

Total records written : 7676 Total bytes written : 341396 Spillable Memory Manager spill count : 103 Total bags proactively spilled: 39 Total records proactively spilled: 32389322

The end result is referred to as "Success!". I'm still not sure. What do the numbers above mean?

Thanks.

+4

join hadoop apache-pig

Navneet Sep 11 '12 at 10:37

source share

1 answer

Lorand bendig · Accepted Answer · 2012-09-16T15:09:56+0000

The first two show the general records / bytes recorded in HDFS by your MR task.
It may happen that during the MR job not all are written to memory. Spill counters indicate how many records have been written to the local drives of your data to avoid running out of memory.

Pig uses two methods to control memory usage and spill if necessary:

1. Redistributable memory manager :

It looks like a central place where the pouring bags are recorded. In case of low memory, this manager goes through the list of registered packages and runs the GC .

2. Proactive (self-propagation):

Bags can also spill if their memory limit is reached (see pig.cachedbag.memusage )

Back to statistics:

The total number of bags proactively spilled: the number of bags that were spilled.
Total records proactively spilled: # records in these bags.

It is always useful to check the spill statistics of your work, as many spills can indicate tremendous success, which should be avoided .

Records proactively spilled into the Hadoop Pig?

More articles: