HDFS vs LFS - How Hadoop Dist. Is the file system built on top of the local file system?

From the various blogs I read, I realized that HDFS is another layer that exists in the local file system on a computer.

I also installed hadoop, but it's hard for me to understand the existence of the hdfs layer on top of the local file system.

Here is my question ..

I believe that I install hadoop in pseudo-distributed mode. What happens under the hood during this installation? I added the tmp.dir parameter to the configuration files. Is is the only folder the namenode daemon is talking to when it is trying to access datanode ??

+4
source share
2 answers

OK .. give me a try. When you configure Hadoop, it installs a virtual FS on top of the local FS, which is HDFS. HDFS stores data in the form of blocks (similar to the local FS, but much more compared to it) in a replicated manner. But the HDFS directory tree or file system namespace is identical to the local FS tree. When you start recording data in HDFS, it eventually only writes to the local FS, but you cannot see it there directly.

The temp directory actually does 3 things:

1- The directory in which namenode stores its metadata with the default value ${hadoop.tmp.dir}/dfs/name and ${hadoop.tmp.dir}/dfs/name can be explicitly specified. If you specify dfs.name.dir, namenode metadata will be stored in the directory specified as the value of this property.

2- The directory where the HDFS data blocks are stored, with a default value of ${hadoop.tmp.dir}/dfs/data and ${hadoop.tmp.dir}/dfs/data can be explicitly specified. If you specify dfs.data.dir, the HDFS data will be stored in the directory specified as the value of this property.

3-Directory, where the secondary namenode stores its control points, the default value is ${hadoop.tmp.dir}/dfs/namesecondary and can be explicitly specified by fs.checkpoint.dir .

Thus, it is always better to use a specific, dedicated location as the values โ€‹โ€‹of these properties for a cleaner installation.

When access to a specific data block is required, metadata stored in the dfs.name.dir directory is searched, and the location of this block in a specific datanode is returned to the client (which is located somewhere in the dfs.data.dir directory in the local FS). The client then reads the data directly from there (the same applies to the record).

It is important to note that HDFS is not a physical FS. It is rather a virtual abstraction on top of your local FS, which cannot be viewed simply as a local FS. To do this, you need to use the HDFS shell or the HDFS web interface or the available APIs.

NTN

+6
source

When you install hadoop in pseudo-distributed mode, all the HDFS daemons namdenode, datanode and secondary name node work on the same computer. The temporary directory that you are configuring is where the node data stores the data. Therefore, when you look at it in terms of HDFS, your data is still stored in a block and read in blocks, which are much larger (and aggregated) at several levels of the file system level.

0
source

Source: https://habr.com/ru/post/1483336/


All Articles