Yes, you can use FileOutputCommitter, which moves the contents of the temporary task directory to the destination output directory when the task succeeds and deletes the original task folder.
I believe that most of the built-in output formats that extend FileOutputFormat in Hadoop use an OutputCommitter, by default it is a FileOutputCommitter.
This is the code from FileOutputFormat
public synchronized OutputCommitter getOutputCommitter(TaskAttemptContext context ) throws IOException { if (committer == null) { Path output = getOutputPath(context); committer = new FileOutputCommitter(output, context); } return committer; }
To write on multiple paths, you can probably learn MultipleOutputs , which uses OutputCommitter by default.
Or you can create your own output format and extend FileOutputFomat and override the above function in FileOutputFormat, create your own implementation of OutputCommitter by looking at the code of FileOutputCommitter.
In the FileOoutputcommitter code you will find a function that may interest you -
@Override public void abortTask(TaskAttemptContext context) { try { if (workPath != null) { context.progress(); outputFileSystem.delete(workPath, true); } } catch (IOException ie) { LOG.warn("Error discarding output" + StringUtils.stringifyException(ie)); } }
If the task is successful, then commitTask () is called, which, by default, moves the timearay task output directory (which has the task ID in its name to avoid conflicts between the attempt tasks) to the final output path, $ {mapred.out put.dir }. Otherwise, the framework calls abortTask (), which removes the temporary output directory task.