We are currently evaluating Spark in our cluster, which already supports MRv2 compared to YARN.
We noticed the problem of simultaneously completing tasks, in particular, that Spark's work will not free up resources until the task is completed. Ideally, if two people run any combination of the MRv2 and Spark jobs, the resources should be distributed fairly.
In Spark 1.2, I noticed a feature called "dynamic resource allocation", but this does not seem to solve the problem, since it frees up resources only when Spark is IDLE, and not during BUSY.
I could not find additional information on this. On the other hand, I think this is a fairly common problem for many users.
So,
- What is your experience with the multi-user MRv2 and Spark cluster with YARN?
- Is Spark an architect capable of supporting resource release while he is busy? Is this a planned feature or is it something contrary to the idea of ββSpark artists?
source
share