Hadoop DistCp processes the same file name, renaming

Is there a way to run DistCp, but with the option of renaming to filenames? Maybe the easiest way to explain with an example.

Let's say I copy hdfs: /// foo to hdfs: /// bar, and foo contains these files:

hdfs:///foo/a
hdfs:///foo/b
hdfs:///foo/c

and bar contains the following values:

hdfs:///bar/a
hdfs:///bar/b

Then after the copy, I would like for bar to contain something like:

hdfs:///bar/a
hdfs:///bar/a-copy1
hdfs:///bar/b
hdfs:///bar/b-copy1
hdfs:///bar/c

If there is no such option, what could be the most reliable / efficient way to do this? My own adult version of distcp could certainly do this, but it looks like it could be a lot of work and quite error prone. Basically, I'm not interested in file names at all, only their directory, and I want to periodically copy large amounts of data into the "consolidation" directory.

+4
1

Distcp . Java API , , , , . FileSystem, exists(Path p).

0

Source: https://habr.com/ru/post/1539327/


All Articles