How the hdfs mv team works

I would like to know how the team works mvin hdfs?

  • Is this just a symbolic change without any actual data movement?

    • If the moveTo directory exists (maybe in the diff section)
    • If moveTo is the new directory
  • Is it possible to corrupt data when moving large files inside chaos? So, cpor distcpa safer option?

+4
source share
2 answers

hdfs dfs -mv, HDFS . , RPC NameNode. NameNode RPC inode , . ( - , .)

NameNode , . DataNodes hdfs dfs -mv. , , , . NameNode . .

NameNode , . "" , , , , .

. HDFS HDFS . . Apache Hadoop distro S3, Azure Storage OpenStack Swift. , . , , . S3 Swift copy-then-delete, . Azure Storage , Azure Storage blob, .

, hdfs dfs -mv . , . , . hdfs dfs -mv HDFS . .

> hdfs dfs -mv hdfs:///testData file:///tmp/testData
mv: `hdfs:///testData': Does not match target filesystem

, . Hadoop , , . DistCp .

+6

mv (move) - . , , cp ().

. .

  • /tmp/1.txt.

    :

    hdfs fsck /tmp/1.txt -files -blocks -locations 
    

    :

    /tmp/1.txt 5 bytes, 1 block(s):  OK
    0. BP-1788638071-172.23.206.41-1439815305280:blk_1073747956_7133 len=5 repl=1 [DatanodeInfoWithStorage[192.168.56.1:50010,DS-cf19d920-d98b-4877-9ca7-c919df1a869a,DISK]]
    
  • (mv) /tmp/1.txt /tmp/1_renamed.txt, /tmp.

    :

    hdfs fsck /tmp/1_renamed.txt -files -blocks -locations 
    

    :

    /tmp/1_renamed.txt 5 bytes, 1 block(s):  OK
    0. BP-1788638071-172.23.206.41-1439815305280:blk_1073747956_7133 len=5 repl=1 [DatanodeInfoWithStorage[192.168.56.1:50010,DS-cf19d920-d98b-4877-9ca7-c919df1a869a,DISK]]
    
  • (mv) /tmp/1_renamed.txt /tmp1/1.txt, /tmp1.

    :

    hdfs fsck /tmp1/1.txt -files -blocks -locations 
    

    :

    /tmp1/1.txt 5 bytes, 1 block(s):  OK
    0. BP-1788638071-172.23.206.41-1439815305280:blk_1073747956_7133 len=5 repl=1 [DatanodeInfoWithStorage[192.168.56.1:50010,DS-cf19d920-d98b-4877-9ca7-c919df1a869a,DISK]]
    

, - 3 mv :

0. BP-1788638071-172.23.206.41-1439815305280:blk_1073747956_7133 len=5 repl=1 [DatanodeInfoWithStorage[192.168.56.1:50010,DS-cf19d920-d98b-4877-9ca7-c919df1a869a,DISK]]

, mv Node. "Chris Nauroth" , mv.

: , cp distcp. .

  • cp

    hadoop fs -checksum .

    /tmp/1GB/part-m-00000 /tmp1/part-m-00000. :

    hadoop fs -checksum /tmp/1GB/part-m-00000 /tmp1/part-m-00000
    
    /tmp/1GB/part-m-00000   MD5-of-262144MD5-of-512CRC32    0000020000000000000400008f15c32887229c0495a23547e2f0a29a
    /tmp1/part-m-00000      MD5-of-262144MD5-of-512CRC32    0000020000000000000400008f15c32887229c0495a23547e2f0a29a
    

    , . , hadoop fs -checksum, .

  • distcp

    distcp . , distcp FAILED. , distcp -skipcrccheck.

+1

Source: https://habr.com/ru/post/1622095/


All Articles