How to configure a multi-node Apache Storm cluster

Question

How to configure a multi-node Apache Storm cluster

I follow http://jayatiatblogs.blogspot.com/2011/11/storm-installation.html and http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup configure Apache Storm cluster in Ubuntu 14.04 LTS on AWS EC2.

My node master is 10.0.0.185. My slave nodes are 10.0.0.79, 10.0.0.124 and 10.0.0.84 with myid 1, 2 and 3 in their zookeeper data. I created the Apache Zookeeper ensemble, which consists of all 3 subordinate nodes.

Below are my zoo.cfg for my subordinate nodes:

tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/ubuntu/zookeeper-data clientPort=2181 server.1=10.0.0.79:2888:3888 server.2=10.0.0.124:2888:3888 server.3=10.0.0.84:2888:3888 autopurge.snapRetainCount=3 autopurge.purgeInterval=1

Below are my storm.yaml for my slave nodes:

 ########### These MUST be filled in for a storm configuration storm.zookeeper.server: - "10.0.0.79" - "10.0.0.124" - "10.0.0.84" # - "localhost" storm.zookeeper.port: 2181 # nimbus.host: "localhost" nimbus.host: "10.0.0.185" storm.local.dir: "/home/ubuntu/storm/data" java.library.path: "/usr/lib/jvm/java-7-oracle" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 - 6704 # # worker.childopts: "-Xmx768m" # nimbus.childopts: "-Xmx512m" # supervisor.childopts: "-Xmx256m" # # ##### These may optionally be filled in: # ## List of custom serializations # topology.kryo.register: # - org.mycompany.MyType # - org.mycompany.MyType2: org.mycompany.MyType2Serializer # ## List of custom kryo decorators # topology.kryo.decorators: # - org.mycompany.MyDecorator # ## Locations of the drpc servers # drpc.servers: # - "server1" # - "server2" ## Metrics Consumers # topology.metrics.consumer.register: # - class: "backtype.storm.metric.LoggingMetricsConsumer" # parallelism.hint: 1 # - class: "org.mycompany.MyMetricsConsumer" # parallelism.hint: 1 # argument: # - endpoint: "metrics-collector.mycompany.org"

The following are the storm.yaml options for my node wizard :

 ########### These MUST be filled in for a storm configuration storm.zookeeper.servers: - "10.0.0.79" - "10.0.0.124" - "10.0.0.84" # - "localhost" # storm.zookeeper.port: 2181 nimbus.host: "10.0.0.185" # nimbus.thrift.port: 6627 # nimbus.task.launch.secs: 240 # supervisor.worker.start.timeout.secs: 240 # supervisor.worker.timeout.secs: 240 ui.port: 8772 # nimbus.childopts: "‐Xmx1024m ‐Djava.net.preferIPv4Stack=true" # ui.childopts: "‐Xmx768m ‐Djava.net.preferIPv4Stack=true" # supervisor.childopts: "‐Djava.net.preferIPv4Stack=true" # worker.childopts: "‐Xmx768m ‐Djava.net.preferIPv4Stack=true" storm.local.dir: "/home/ubuntu/storm/data" java.library.path: "/usr/lib/jvm/java-7-oracle" # supervisor.slots.ports: # - 6700 # - 6701 # - 6702 # - 6703 # - 6704 # worker.childopts: "-Xmx768m" # nimbus.childopts: "-Xmx512m" # supervisor.childopts: "-Xmx256m" # ##### These may optionally be filled in: # ## List of custom serializations # topology.kryo.register: # - org.mycompany.MyType # - org.mycompany.MyType2: org.mycompany.MyType2Serializer # ## List of custom kryo decorators # topology.kryo.decorators: # - org.mycompany.MyDecorator # ## Locations of the drpc servers # drpc.servers: # - "server1" # - "server2" ## Metrics Consumers # topology.metrics.consumer.register: # - class: "backtype.storm.metric.LoggingMetricsConsumer" # parallelism.hint: 1 # - class: "org.mycompany.MyMetricsConsumer" # parallelism.hint: 1 # argument: # - endpoint: "metrics-collector.mycompany.org"

I start my zookeeper in all of my subordinate nodes and then launch my storm nimbus in my node master device and then launch the storm manager in all of my subordinate nodes. However, when I browse through Storm in my user interface, there is only 1 dispatcher in the cluster summary with a total of 5 slots and only 1 supervisor information in the supervisor summary, why?

How many slave nodes really work if I send the topology in this case?

Why aren't these 3 supervisors with 15 slots?

What should I do to have 3 supervisors?

When I check in supervisor.log on the sub-nodes, the reasons are as follows:

 2015-05-29T09:21:24.185+0000 bsdsupervisor [INFO] 5019754f-cae1-4000-beb4-fa0 16bd1a43d still hasn't started

+6

apache-zookeeper apache-storm

Toshihiko May 29 '15 at 9:13

source share

2 answers

Do you mean Nimbus as a node wizard?

Typically, a Zookeeper cluster should be started first, then nimbus, and then supervisors. Zookeeper and Nimbus must always be available for the Storm cluster to work.

You must check the supervisor logs to check for failures. Nimbus host and Zookeeper machines must be accessible from Supervisor machines.

0

Nipun talukdar May 29 '15 at 10:59

source share

Dilip bobby · Accepted Answer · 2015-11-06T07:08:15+0000

What you do is perfect, and its work too.

The only thing you have to change is your storm.dir . This is the same in the slave, and the main nodes simply change the path in the storm.dir path in the nimbus and supervisor nodes (do not use the same local path). When you use the same local path, the halo and the supervisor have the same identifier. They come into play, but you don’t see 8 slots, they just show you 4 slots as workers.

Modify ( storm.local.dir:/home/ubuntu/storm/data ) and do not use the same path in the supervisor and nimbus.

How to configure a multi-node Apache Storm cluster

More articles: