I read and tried to understand how the Spark environment uses its cores in Standalone mode. According to Spark documentation, the value of spark.task.cpus "is set to 1 by default, which means the number of cores allocated for each task.
Question 1: For a multi-core machine (for example, only 4 cores, 8 hardware threads), when "spark.task.cpus = 4", will sparks use 4 cores (1 thread per core) or 2 cores with a hyper-thread?
What happens if I set "spark.task.cpus = 16" more than the number of available hardware threads on this computer?
Question 2: How is this type of parallelism hardware achieved? I tried to look into the code, but could not find anything that binds to the hardware or JVM for the main level of parallelism. For example, if a task is a filter function, then how does one filter task relate to multiple cores or threads?
Maybe something is missing for me. Is this related to the Scala language?
source share