Distcp is a special tool used to copy data from one cluster to another. Usually you usually copy from one hdfs to hdfs, but not for the local file system. Another important thing is that the process executed as setting mapreduce from 0 reduces the task, which makes it faster due to the distribution of operations. It expands the list of files and directories into input for map tasks, each of which will copy a section of files specified in the list of sources
hdfs put - copies data from the local system to hdf. Uses the hdfs client for this behind the scenes and does all the work in sequence, referring to NameNode and Datanodes. Does not create MapReduce jobs for data processing.
source
share