OK .. give me a try. When you configure Hadoop, it installs a virtual FS on top of the local FS, which is HDFS. HDFS stores data in the form of blocks (similar to the local FS, but much more compared to it) in a replicated manner. But the HDFS directory tree or file system namespace is identical to the local FS tree. When you start recording data in HDFS, it eventually only writes to the local FS, but you cannot see it there directly.
The temp directory actually does 3 things:
1- The directory in which namenode stores its metadata with the default value ${hadoop.tmp.dir}/dfs/name and ${hadoop.tmp.dir}/dfs/name can be explicitly specified. If you specify dfs.name.dir, namenode metadata will be stored in the directory specified as the value of this property.
2- The directory where the HDFS data blocks are stored, with a default value of ${hadoop.tmp.dir}/dfs/data and ${hadoop.tmp.dir}/dfs/data can be explicitly specified. If you specify dfs.data.dir, the HDFS data will be stored in the directory specified as the value of this property.
3-Directory, where the secondary namenode stores its control points, the default value is ${hadoop.tmp.dir}/dfs/namesecondary and can be explicitly specified by fs.checkpoint.dir .
Thus, it is always better to use a specific, dedicated location as the values โโof these properties for a cleaner installation.
When access to a specific data block is required, metadata stored in the dfs.name.dir directory is searched, and the location of this block in a specific datanode is returned to the client (which is located somewhere in the dfs.data.dir directory in the local FS). The client then reads the data directly from there (the same applies to the record).
It is important to note that HDFS is not a physical FS. It is rather a virtual abstraction on top of your local FS, which cannot be viewed simply as a local FS. To do this, you need to use the HDFS shell or the HDFS web interface or the available APIs.
NTN