Doing work on Amazon ec2: multi node

I need to run database jobs in Amazon EC2.

I tried tuning using existing AMI. But after starting the wizard and clients, "jps" does not list any nodes.

SO even after using public amoop AMI do we have to configure hadoop for masters and subordinates? How does the master know the IP addresses of slaves?

Someone can direct me to some good documents. I have been banging my head about this for over 12 hours.

Can anybody help?

Thanks.

+4
source share
2 answers

Another alternative to what Matthew suggested is to use Whirr.

Whirr really makes it easy to deploy a Hadoop cluster on Amazon, and you donโ€™t have to pay for route scaling. And you can control the cluster version.

Here is the main project page: http://whirr.apache.org/

Here is a quick installation guide for installing Hadoop. It takes 5 minutes to start the Hadoop cluster. http://whirr.apache.org/docs/0.6.0/quick-start-guide.html

+2
source

Instead, I would use the Amazon Elastic MapReduce framework. You can dynamically move machines and clusters up and down, and you donโ€™t have to worry about setting them up to talk to each other.

http://aws.amazon.com/elasticmapreduce/

It is used by many people, and it is mostly reliable. This saves you the absolute TON of work typically done by setting up and administering a cluster. Just one difference from a regular howop is that itโ€™s best to put things in S3 instead of HDFS (since the clusters are transient, so the HDFS data disappears with the cluster).

+1
source

Source: https://habr.com/ru/post/1386052/


All Articles