Doing work on Amazon ec2: multi node

Question

Doing work on Amazon ec2: multi node

I need to run database jobs in Amazon EC2.

I tried tuning using existing AMI. But after starting the wizard and clients, "jps" does not list any nodes.

SO even after using public amoop AMI do we have to configure hadoop for masters and subordinates? How does the master know the IP addresses of slaves?

Someone can direct me to some good documents. I have been banging my head about this for over 12 hours.

Can anybody help?

Thanks.

+4

amazon-ec2 mapreduce hadoop

Android_enthusiast Dec 13 '11 at 8:23

source share

2 answers

Thomas jungblut · Answer 1 · 2011-12-13T18:41:21+0000

Another alternative to what Matthew suggested is to use Whirr.

Whirr really makes it easy to deploy a Hadoop cluster on Amazon, and you don’t have to pay for route scaling. And you can control the cluster version.

Here is the main project page: http://whirr.apache.org/

Here is a quick installation guide for installing Hadoop. It takes 5 minutes to start the Hadoop cluster. http://whirr.apache.org/docs/0.6.0/quick-start-guide.html

Matthew Rathbone · Answer 2 · 2011-12-13T17:33:06+0000

Instead, I would use the Amazon Elastic MapReduce framework. You can dynamically move machines and clusters up and down, and you don’t have to worry about setting them up to talk to each other.

http://aws.amazon.com/elasticmapreduce/

It is used by many people, and it is mostly reliable. This saves you the absolute TON of work typically done by setting up and administering a cluster. Just one difference from a regular howop is that it’s best to put things in S3 instead of HDFS (since the clusters are transient, so the HDFS data disappears with the cluster).

Doing work on Amazon ec2: multi node

More articles: