I am trying to create and maintain a sequence file on HDFS using the Java API without doing a MapReduce job as a setting for future MapReduce work. I want to save all my input data for MapReduce job in one sequence file, but data is added during the day throughout the day. The problem is that if a SequenceFile exists, the next call will simply overwrite the SequenceFile instead of adding to it.
// fs and conf are set up for HDFS, not as a LocalFileSystem seqWriter = SequenceFile.createWriter(fs, conf, new Path(hdfsPath), keyClass, valueClass, SequenceFile.CompressionType.NONE); seqWriter.append(new Text(key), new BytesWritable(value)); seqWriter.close();
Another problem is that I cannot maintain a file of my own format and turn the data into a SequenceFile at the end of the day, because the MapReduce task can be started using this data anywhere.
I cannot find any other API call to add to SequenceFile and save its format. I also can't just concatenate two SequenceFiles because of their formatting needs.
I also wanted to avoid doing the MapReduce job for this, since it has a lot of overhead for the small amount of data that I add to the SequenceFile.
Any thoughts or workarounds? Thanks.
source share