Change block size of existing files in Hadoop

Consider a hadoop cluster, where the default block size is 64 MB in hdfs-site.xml . However, later the team decides to change this to 128 MB. Here are my questions on the above scenario?

  • Will this change require a reboot of the cluster or will it be automatically taken and all new files will have a default block size of 128 MB?
  • What will happen to existing files with a block size of 64M? Will the configuration change be applied to existing files automatically? If this is done automatically, when will it be done - as soon as this is done or when will the cluster be launched? If this is not done automatically, then how to manually make this block change?
+6
source share
3 answers

Will this change require a restart of the cluster or will it be reviewed automatically and all new files will have a default block size of 128

For this property change to take effect, a cluster restart is required.

What will happen to existing 64M block size files? Will the configuration change be applied to existing files automatically?

Existing blocks will not change their block size.

If this is not done automatically, then how to manually make this block change?

To modify existing files, you can use distcp. It will copy files with the new block size. However, you will have to manually delete the old files with the older block size. Here you can use the command

 hadoop distcp -Ddfs.block.size=XX /path/to/old/files /path/to/new/files/with/larger/block/sizes. 
+7
source

As mentioned here for your item:

  • Whenever you change the configuration, you need to restart NameNode and DataNodes so that they can change their behavior.
  • No, it will not. It will save the old block size in the old files. In order for it to accept a new block change, you need to rewrite the data. You can either make a fs -cp chaos or distcp on your data. The new copy will have a new block size, and you can delete your old data.

Check the link for more information.

0
source

At point 1 - In Hadoop 1.2.1, a restart is not required after changing the dfs.block.size file in the hdfs-site.xml file. The file block size can be easily checked by checking the Hadoop administration page at http: // namenode: 50070 / dfshealth.jsp

Provide a change to dfs.block.size in all data nodes.

0
source

Source: https://habr.com/ru/post/985059/


All Articles