Is there a way to set the replication rate for outputting a specific MapReduce job, unlike the rest of the cluster (say 1)? I would like my main dataset to be 3 replicas (as it is now), but the output of some of my jobs quickly leaves the cluster and is ultimately discarded, so no replication is required, and I could use the space.
I could use setrep , but I think I can only do this after that.
source share