How to update spark configuration after resizing work nodes in Cloud Dataproc

I have a DataProc Spark Cluster. Initially, the main and 2 work nodes are of type n1-standard-4 (4 vCPU, 15.0 GB of memory), then I changed them all to n1-highmem-8 (8 vCPU, 52 GB of memory) through the web console.

I noticed that the two work nodes are not fully used. In particular, for the first working node and 1 executor of the second working node, there are only 2 executors, with

spark.executor.cores 2
spark.executor.memory 4655m

c /usr/lib/spark/conf/spark-defaults.conf. I thought that with the spark.dynamicAllocation.enabled truenumber of performers will be increased automatically.

In addition, information about the DataProc web console page is also not automatically updated. It seems that DataProc still considers all nodes to be n1-standard-4.

My questions

  • Why does the first working node have more artists on the second?
  • node ?
  • , , , , ?
+4
1

, .

:

  • Spark ApplicationMaster YARN node, , .
  • , Dataproc YARN .
  • YARN NodeManager , YARN. /etc/hadoop/conf/yarn -site.xml, suoo-yarn-nodemanager. ResourceManager, Spark . YARN , Spark, spark.executor.memory spark.executor.cores.

, , . , hasoop, distcp. :

$ hadoop distcp hdfs:///some_directory hdfs://other-cluster-m:8020/

, Cloud Storage:

$ hadoop distcp hdfs:///some_directory gs://<your_bucket>/some_directory

, , . , , HDFS, :

gs://<your_bucket>/path/to/file

GCS , ( HDFS, ), .

+4

Source: https://habr.com/ru/post/1650061/


All Articles