Does dfs.blocksize change existing data

Question

Does dfs.blocksize change existing data

My version of Hadoop is 2.5.2. I am modifying my dfs.blocksize file in the hdfs-site.xml file on the main node. I have the following question:

1) Will this change affect existing data in HDFS 2) Do I need to propagate this change to all its nodes in the Hadoop cluster or only to NameNode?

0

hadoop hdfs dfs

Tariq Feb 18 '15 at 14:46

source share

4 answers

1) Will this change affect existing data in HDFS

No, it will not. It will save the old block size in the old files. In order for it to accept a new block change, you need to rewrite the data. You can either have hadoop fs -cp or distcp in your data. The new copy will have a new block size, and you can delete your old data.

2) Do I need to propagate this change to all its nodes in the Hadoop cluster or only to NameNode?

I believe that in this case you only need to change the NameNode. However, this is a very bad idea. You need to keep all configuration files in sync for a number of good reasons. When you take your Hadoop deployment more seriously, you should probably use something like Puppet or Chef to manage your configs.

Also note that whenever you change the configuration, you need to restart NameNode and DataNodes so that they can change their behavior.

Interesting note: you can set the size of individual files as they are written to overwrite the default block size. For example, hadoop fs -D fs.local.block.size=134217728 -put ab

+3

Donald miner Feb 18 '15 at 17:38

source share

Changing the block size in hdfs-site.xml only affects new data.

+1

Bhuvan Feb 18 '15 at 16:36

source share

what distribution are you using ..., seeing your questions, it looks like you are using the apache distribution. The easiest way to find is to write a shell script to first remove hdfs-site.xml in slaves, e.g.

 ssh username@domain.com 'rm /some/hadoop/conf/hdfs-site.xml' ssh username@domain2.com 'rm /some/hadoop/conf/hdfs-site.xml' ssh username@domain3.com 'rm /some/hadoop/conf/hdfs-site.xml'

later copy hdfs-site.xml from the wizard to all subordinates

 scp /hadoop/conf/hdfs-site.xml username@domain.com :/hadoop/conf/ scp /hadoop/conf/hdfs-site.xml username@domain2.com :/hadoop/conf/ scp /hadoop/conf/hdfs-site.xml username@domain3.com :/hadoop/conf/

+1

Bhuvan Feb 18 '15 at 17:15

source share

Bhuvan · Accepted Answer · 2015-02-18T17:04:41+0000

you must make changes to the hdfs-site.xml of all slaves ... the size of dfs.block must be consistent across all datanodes.

Does dfs.blocksize change existing data

More articles: