Hadoop dfs replicated

Question

Hadoop dfs replicated

Sorry guys, just a question, but I can't find the exact question on Google. The question is what dfs.replication means? If I made one file named filmdata.txt in hdfs, if I set dfs.replication = 1, would it be completely one file (one movie data.txt)? Or, in addition to the main file (filmdata.txt), hasoop will create another replication file, tell me briefly: if set dfs.replication = 1, is there completely one filmdata.txt file or two filmdata.txt files? Thanks at Advance

+4

hadoop hdfs

Jack Oct 11 '12 at 8:34

source share

4 answers

To ensure high data availability, Hadoop replicates the data.

When we store files in HDFS, the hasoop system splits the file into many blocks (64 MB or 128 MB), and then these blocks will be replicated across the nodes of the cluster. The dfs.replication configuration is to specify how many retries are required.

The default value for dfs.replication is 3, but this is configurable, depending on the configuration of your cluster.

Hope this helps.

+5

Ramana Oct 11 '12 at 10:15

source share

The link provided by Praveen is now broken. Here is the updated link describing the dfs.replication parameter.

Refer to the Hadoop Cluster Setup . for more information on configuration options.

You may notice that files can span multiple blocks, and each block will be replicated in the amount specified in dfs.replication (the default value is 3). The size of such blocks is specified in the dfs.block.size parameter.

+1

Shree Sep 2 '13 at 17:15

source share

In the HDFS environment, we use commodity machines for data storage, these commodity machines are not high-performance machines, such as servers with high RAM, there will be a chance of losing data nodes (d1, d2, d3) or block (b1, b2, b3), as a result the HDFS structure splits each block of data (64 MB, 128 MB) into three replications (by default), and each block will be stored in separate data nodes (d1, d2, d3). Now consider that block (b1) is damaged in data-node (d1), and a copy of block (b1) is available in data-node (d2) and data-node (d3), so that the client can request data node (d2) for process the data block (b1) and provide the result just as if the data

I hope you have some clarity.

0

Anil Kumar KB Jan 22 '16 at 9:30

source share

Praveen sripati · Accepted Answer · 2012-10-11T09:01:28+0000

The total number of files in the file system will be indicated in the dfs.replication coefficient. So, if you set dfs.replication = 1, then the file system will have only one copy of the file.

Check Apache Documentation for other configuration options.

Hadoop dfs replicated

More articles: