Explanation for Hadoop Mapreduce console output

I am new to Hadoop. I already configured a 2-node hadoop cluster. Then I run the mapreduce sample application. (The number of words actually). then I got a conclusion like this

File System Counters FILE: Number of bytes read=492 FILE: Number of bytes written=6463014 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=71012 HDFS: Number of bytes written=195 HDFS: Number of read operations=404 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=80 Launched reduce tasks=1 Data-local map tasks=80 Total time spent by all maps in occupied slots (ms)=429151 Total time spent by all reduces in occupied slots (ms)=72374 Map-Reduce Framework Map input records=80 Map output records=8 Map output bytes=470 Map output materialized bytes=966 Input split bytes=11040 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=966 Reduce input records=8 Reduce output records=5 Spilled Records=16 Shuffled Maps =80 Failed Shuffles=0 Merged Map outputs=80 GC time elapsed (ms)=5033 CPU time spent (ms)=59310 Physical memory (bytes) snapshot=18515763200 Virtual memory (bytes) snapshot=169808543744 Total committed heap usage (bytes)=14363394048 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=29603 File Output Format Counters Bytes Written=195 

Are there any explanations regarding all the data I received? special,

  1. Total time spent by all cards in occupied slots (ms)
  2. Total time spent by everyone decreases in occupied slots (ms)
  3. CPU time used (ms)
  4. Physical memory (bytes)
  5. Virtual memory snapshot (in bytes)
  6. Total recorded heap usage (in bytes)
+3
source share
1 answer

The massive Mapreduce design supports counters while work has been submitted. These counters are shown to the user for insufficient preparation of statistical data, as well as for evaluating the results and analyzing the effectiveness. The output of your work showed you some of the counters. There is a good explanation in the final chapter of chapter 8 on good explanations, I suggest you check it once.

To find out about the items you requested,

1) The total time spent on all cards. Total time to complete tasks on the map in milliseconds. It includes tasks that were run speculatively (speculative tools that perform unsuccessful or slow work after waiting for a specified time, in the case of a crying speculative task means the repeated execution of a specific map task).

2) Total time spent by all abbreviations - The total execution time reduces tasks in milliseconds.

3) CPU Time - cumulative processor time for the task in milliseconds

4) Physical memory. The physical memory used by the task in bytes here takes into account the total memory used for spills.

5) Shared virtual memory - virtual memory used by task in bytes

6) The total number of heaps involved - the total amount of memory available in the JVM in bytes

Hope this helps. The categories of meters and their details are carefully listed in the final manual, if you need more information, please let me know.

Thanks.

Additional information after the comment -

RAM is the main memory that is used when processing a job. The data will be delivered to RAM, and the work will be processed to save it in RAM. But the data may be larger than the size of RAM allocated. In such scenarios, the operating system stores data on disk and swaps them into RAM and from RAM to provide sufficient smaller RAM memory so that the files are higher in memory. for example, RAM is 64 MB, and suppose that if the file size is 128 MB, then 64 MB will be stored in RAM first and the other 64 in DISK and swap it. Although it will not store it as 64 MB and 64 MB, internally it divides into segments / pages.

I just gave an example to understand. Virtual memory is the concept of working with files larger than RAM using pages and sharing them using DISK and RAM. Thus, for the case above, it actually uses 64 MB from the disk as RAM, so it is called Virtual Memory.

I hope you understand. If you are satisfied with the answer, please accept it as an answer. Let me know if you have any questions.

Copy the JVM memory used to store the objects, which is set using JVM_OPTS on the command line. Usually all java programs should have these settings.

+3
source

Source: https://habr.com/ru/post/975799/


All Articles