Apache Spark 2.0.0 API remains much like 1.X, Spark 2.0.0 has changes that violate the API
Apache Spark 2.0.0 is the first version on line 2.x. Major updates include API usability, SQL 2003 support, improved performance, structured streaming, R UDF support, and operational improvements.
New in spark 2:
- The biggest change I see is that the DataSet and DataFrame APIs will be merged.
- The latest and best of Spark will be much more effective than its predecessors. Spark 2.0 is going to focus on combining parquet and caching to achieve even greater throughput.
- Structured streaming is another important thing!
- This will be the first version to focus on ETL. Serial versions will add more operators and libraries for ETL
You can go through Spark release 2.0.0 , where updates are explained in the following paragraphs:
- API Stability
- Core and Spark SQL
- MLlib
- Sparkark
- Streaming
- Dependence, packaging and operations
- Deletions, Behavioral Changes, and Deletions
- Known Issues
source share