Hi, I am trying to run Apache Nutch 1.2 on Amazon EMR.
To do this, I set the input directory from S3. I get the following error:
Fetcher: java.lang.IllegalArgumentException:
This file system object (hdfs: //ip-11-202-55-144.ec2.internal: 9000)
does not support access to the request path
's3n: // crawlResults2 / segments / 20110823155002 / crawl_fetch'
You possibly called FileSystem.get (conf) when you should have called
FileSystem.get (uri, conf) to obtain a file system supporting your path.
I understand the difference between FileSystem.get(uri, conf) and FileSystem.get(conf) . If I wrote this myself, I would have FileSystem.get(uri, conf) , but I'm trying to use the existing Nutch code.
I asked this question and someone told me that I need to change hadoop-site.xml to include the following properties: fs.default.name , fs.s3.awsAccessKeyId , fs.s3.awsSecretAccessKey . I updated these properties in core-site.xml ( hadoop-site.xml does not exist), but this did not change the situation. Does anyone have any other ideas? Thanks for the help.
source share