Disable replication for Hadoop job output only.

Question

Disable replication for Hadoop job output only.

Is there a way to set the replication rate for outputting a specific MapReduce job, unlike the rest of the cluster (say 1)? I would like my main dataset to be 3 replicas (as it is now), but the output of some of my jobs quickly leaves the cluster and is ultimately discarded, so no replication is required, and I could use the space.

I could use setrep , but I think I can only do this after that.

+4

hadoop

Donald miner Nov 08 '11 at 20:16

source share

1 answer

wutz · Accepted Answer · 2011-11-08T20:33:56+0000

When you upload a file, you can override the default DFS replication ratio by passing

-D dfs.replication=1

This should work when passed when a job is called.

Disable replication for Hadoop job output only.

More articles: