Spark Performance Lack

The paper, “The Meaning of Performance in Data Analysis Structures,” published in NSDI 2015, concluded that the CPU (not the IO or the network) is Spark's performance bottleneck. Kay conducted several experiments on Spark, including BDbench, TPC-DS, and the processing workload (only Spark SQL is used in this article). I wonder if this conclusion is suitable for some frameworks built on Spark (for example, Streaming, with a continuous data stream received through the network, both network IOs and the disk will experience high pressure).

+1
source share
2 answers

Network and disk may experience less pressure in Spark Streaming, since streams are usually checkpointed , which means that all data is usually not stored permanently.

But ultimately, this is a research question: the only way to solve this problem is to navigate. Kay code is open source .

+2
source

. , , , . , , , , -. , . IO ..

+2

Source: https://habr.com/ru/post/1666400/


All Articles