Apache Spark vs Apache Spark 2

What are the improvements that Apache Spark2 brings over Apache Spark?

  • In terms of architecture
  • In terms of application
  • or more
+6
source share
2 answers

Apache Spark 2.0.0 API remains much like 1.X, Spark 2.0.0 has changes that violate the API

Apache Spark 2.0.0 is the first version on line 2.x. Major updates include API usability, SQL 2003 support, improved performance, structured streaming, R UDF support, and operational improvements.

New in spark 2:

  • The biggest change I see is that the DataSet and DataFrame APIs will be merged.
  • The latest and best of Spark will be much more effective than its predecessors. Spark 2.0 is going to focus on combining parquet and caching to achieve even greater throughput.
  • Structured streaming is another important thing!
  • This will be the first version to focus on ETL. Serial versions will add more operators and libraries for ETL

You can go through Spark release 2.0.0 , where updates are explained in the following paragraphs:

  • API Stability
  • Core and Spark SQL
  • MLlib
  • Sparkark
  • Streaming
  • Dependence, packaging and operations
  • Deletions, Behavioral Changes, and Deletions
  • Known Issues
+6
source

There isn’t much difference in architecture, since the nut bracket is still DAG and RDD, which is its most important part!

Although Spark 2.0 is much more optimized and has a DataSet Api, which gives much more powerful features to developers. Therefore, I would say that the architecture is the same as Spark 2.0, provides a lot of optimized and has a rich set of Api!

These are the main things that Apache Spark 2.0 provides:

  • The biggest change I see is that the DataSet and DataFrame APIs will be merged.
  • The last and best of Spark will be a whole lot effective compared to its predecessors. Spark 2.0 is going to focus on combining parquet and caching to achieve even better throughput.
  • Structured streaming is another important thing!
  • This will be the first version to focus on ETL. Serial versions will add more operators and libraries for ETL

For more information, please take here: https://www.quora.com/What-are-special-features-and-advantages-of-Apache-Spark-2-0-over-earlier-versions

+4
source

Source: https://habr.com/ru/post/1011533/


All Articles