The first two show the general records / bytes recorded in HDFS by your MR task.
It may happen that during the MR job not all are written to memory. Spill counters indicate how many records have been written to the local drives of your data to avoid running out of memory.
Pig uses two methods to control memory usage and spill if necessary:
1. Redistributable memory manager :
It looks like a central place where the pouring bags are recorded. In case of low memory, this manager goes through the list of registered packages and runs the GC .
2. Proactive (self-propagation):
Bags can also spill if their memory limit is reached (see pig.cachedbag.memusage )
Back to statistics:
- The total number of bags proactively spilled: the number of bags that were spilled.
- Total records proactively spilled: # records in these bags.
It is always useful to check the spill statistics of your work, as many spills can indicate tremendous success, which should be avoided .
source share