I am using the following code:
csv.saveAsTextFile(pathToResults, classOf[GzipCodec])
The pathToResults directory contains many files, such as part-0000, part-0001, etc. I can use FileUtil.copyMerge (), but it is very slow, it loads all the files into the driver program and then loads them into hadoop. But FileUtil.copyMerge () is faster than:
csv.repartition(1).saveAsTextFile(pathToResults, classOf[GzipCodec])
How to combine research results files without redistribution and FileUtil.copyMerge ()?
source share