I am trying to write code to import files into HDFS for use as an external hive table. I found that using something like:
foo | ssh hostname "hdfs dfs -put - / destination / $ FILENAME"
can cause an error type when a temporary file is created, and then after it is renamed. This can lead to a race condition for the hive between the directory listing and query execution.
One way is to copy the βhdfs dfs mvβ file to the desired location in the temporary directory.
Specific and general / academic issues:
- The hdfs dfs -mv command is atomic, right?
- What other HDFS commands or operations are atomic?
- Can the two "hdfs dfs -mkdir" commands issued at about the same time believe that both of them succeeded?
- Is there a better way to avoid race conditions with a hive when moving files in place?
source share