In addition to defining apoop apache from the Official Website , I would like to emphasize that Hadoop is a framework and there are many subsystems in the Hadoop ecosystem
To quote this content from the official website so that broken links in the future do not cause any problems with this answer.
The project includes the following modules:
Hadoop Common: General utilities that support other Hadoop modules.
Hadoop Distributed File System (HDFS™): distributed file system that provides high-performance access to application data.
Hadoop YARN: framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for processing large datasets in parallel.
More or less,
Hadoop = Distributed Memory (HDFS) + Distributed Processing (YARN abbreviation + card)
But these four modules do not cover the entire Hadoop ecosystem. Hadoop Ecosystems has many Hadoop related projects and over 40 subsystems.
Other Hadoop related projects in Apache include:
Ambari™: A web-based tool for supporting, managing and monitoring Apache Hadoop clusters, which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HC catalog, HBase, ZooKeeper, Oozie, Pig and Sqoop.
Avro™ : data serialization system.
Cassandra™: scalable multi-wizard database with no points of failure.
Chukwa™: A data acquisition system for managing large distributed systems.
HBase™: scalable distributed database that supports structured data storage for large tables.
Hive™: A data warehouse infrastructure that provides data summaries and special requests.
Mahout™: scalable machine learning and data mining library.
Pig™: high-level data flow language and runtime structure for parallel computing.
Spark™: fast and general computing engine for Hadoop data. Spark offers a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, flow processing, and graph calculation.
Tez™: A generic data flow programming framework built on Hadoop YARN that provides a powerful and flexible mechanism for performing arbitrary DAG data processing tasks for batch and interactive use cases. Tez is being adopted by Hive ™, Pig ™, and other entities in the Hadoop ecosystem, as well as other commercial software (such as ETL tools), to replace Hadoop ™ MapReduce as its primary execution engine.
ZooKeeper™: High-performance coordination service for distributed applications.
Returning to your question:
Just take a look at the 40+ subsystems in the Hadoop eco system. Every thing you quoted may not be Hadoop, but most of them are related to Hadoop.
The spark is part of the Hadoop ecosystem. But he cannot use HDFS and YARN . HDFS datasets can be replaced with RDD (Flexible Distributed Dataset) and can operate in Standalone mode without YARN .
Take a look at this article and article for a comparison of Hadoop and Spark.
Examples of using sparks over Hadoop:
- Iterative Algorithms in Machine Learning
- Interactive data mining and data processing
- Stream processing
- Sensor data processing
Since Spark does not have a storage system, it should depend on one of the distributed repositories, where HDFS is one of them.
Take a look at the related SE question:
Can launch apache without chaos?