Am I fully using my EMR cluster?

  • Total instances: I created an EMR with a total of 11 nodes (1 master instance, 10 primary instances).
  • : spark-submit myApplication.py

enter image description here

  • container graph: Next, I have these graphs that refer to โ€œcontainersโ€, and I'm not quite sure that the containers are in the EMR context, so itโ€™s not obvious what this tells me:

enter image description here

  • actual running artists: and then I got this in my history interface a spark that shows that I only have 4 artists that have ever been created.
  • Dynamic allocation: Then I have spark.dynamicAllocation.enabled=True , and I see this in the details of my environment.
  • Operator memory: In addition, artist memory is located at 5120M by default.

  • Artists: Next, I have a tab for my artists showing that I have what looks like 3 active and 1 dead artist: enter image description here

So, at face value it seems to me that I do not use all my nodes or available memory.

  • How do I know if I use all available resources?
  • If I do not use all available resources to their full potential, how can I change what I do so that available resources are used to their full extent?
+5
source share
1 answer

Another way to find out how much resources each cluster node uses is to use the Ganglia web tool.

This is published on the main node and will show a graph of each node resource usage. The problem will be if you did not enable Ganglia during the creation of the cluster as one of the tools available in the EMR cluster.

After inclusion, you can go to the web page and see how much each node is used.

+1
source

Source: https://habr.com/ru/post/1263150/


All Articles