What is the correct way to identify bottlenecks on a map / job cuts?

In normal Java development, if I want to improve application performance, my usual procedure would be to run the program with the profiler connected or, conversely, inject a collection of check marks into the application. In any case, the immediate goal is to determine the application hotspot and then measure the impact of the changes that I make.

What is the correct counterpart when the application is a card / reduces the work performed in the hadoop cluster?

What options are available for collecting performance data when a job runs slower than you expect from running equivalent logic in your sandbox?

+4
source share
1 answer

Map / Zoom Out

Keep track of your work in Job-Tracker . Here you will see how long the converters and gearboxes are visible. A common example would be if you are doing too much work in gearboxes . In this case, you will notice that the cartographers end pretty soon, while the gearboxes go away forever.
It may also be interesting to see if all your cartographers take a similar amount of time. Maybe work is delayed by several slow tasks? This may indicate a hardware defect in the cluster (in this case, speculative execution may be the answer), or the work area is evenly distributed .

operating system

Watch the nodes (either with something simple, like on top or with monitoring, like munin or ganglion ) to see if your job matches cpu bound or io bound . If, for example, your reduction phase is related to io, you can increase the number of gearboxes you use.
Something else that you can find here is when your tasks are used for large memory . If the controllers do not have enough RAM, the number of tasks on the node can affect performance. The monitoring system can highlight the total replacement .

Common tasks

You can isolate Mapper / Reducers for profiling. In this case, you can use all the tools that you already know.
If you think that a performance problem only occurs when the task is running in a cluster, you can measure the time of the corresponding parts of the code using System.nanoTime() and use System.outs to display some approximate performance numbers.
Of course, it is possible to add JVM parameters to child JVMs and remote profiler connections .

+2
source

Source: https://habr.com/ru/post/1400596/


All Articles