Storm Supervisor shuts down a few seconds after sending TOPOLOGY

I have two zookeeper and nimbus servers and a supervisor. I can successfully run all of them. However, as soon as I send the topology to nimbus, the storm manager will fail (it works correctly when I use LocalCluster, but this happens when using a distributed cluster with StormSubmitter in the code). supervisor.log in the supervisor instance is as follows:

2015-11-19 09:40:49.303 b.s.d.supervisor [INFO] Starting supervisor with id 617c6b5f-628c-4b32-b2a3-123c164588d7 at host ec2-x-x-x-x.us-west-2.compute$
2015-11-19 09:42:04.391 b.s.d.supervisor [INFO] Downloading code for storm id TOPOLOGY_NAME-1-1447926123 from /var/storm/nimbus/stormdist/TOPOLOGY_NAME-1-14$
2015-11-19 09:42:04.400 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
2015-11-19 09:42:14.455 b.s.event [ERROR] Error when processing event
java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
        at backtype.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:59) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:51) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:103) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.ThriftClient.<init>(ThriftClient.java:72) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:74) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:37) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.utils.Utils.downloadFromMaster(Utils.java:361) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.daemon.supervisor$fn__7720.invoke(supervisor.clj:581) ~[storm-core-0.10.0.jar:0.10.0]
        at clojure.lang.MultiFn.invoke(MultiFn.java:241) ~[clojure-1.6.0.jar:?]
        at backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__7638.invoke(supervisor.clj:465) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.event$event_manager$fn__7258.invoke(event.clj:40) [storm-core-0.10.0.jar:0.10.0]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]
Caused by: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
        at org.apache.thrift7.transport.TSocket.open(TSocket.java:187) ~[storm-core-0.10.0.jar:0.10.0]
        at org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:103) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:48) ~[storm-core-0.10.0.jar:0.10.0]
        ... 11 more
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_60]
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_60]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_60]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_60]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_60]
        at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_60]
        at org.apache.thrift7.transport.TSocket.open(TSocket.java:182) ~[storm-core-0.10.0.jar:0.10.0]
        at org.apache.thrift7.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:103) ~[storm-core-0.10.0.jar:0.10.0]
        at backtype.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:48) ~[storm-core-0.10.0.jar:0.10.0]
        ... 11 more
2015-11-19 09:42:14.468 b.s.util [ERROR] Halting process: ("Error when processing an event")
java.lang.RuntimeException: ("Error when processing an event")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:336) [storm-core-0.10.0.jar:0.10.0]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:?]
        at backtype.storm.event$event_manager$fn__7258.invoke(event.clj:48) [storm-core-0.10.0.jar:0.10.0]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

And nimbus.log on the nimbus instance looks like this:

2015-11-19 09:42:02.965 b.s.d.nimbus [INFO] Uploading file from client to /var/storm/nimbus/inbox/stormjar-e7853d30-b798-4c5b-88cd-117cd82a99b0.jar
2015-11-19 09:42:02.974 b.s.d.nimbus [INFO] [req 15] Access from:  principal: op:fileUpload
2015-11-19 09:42:02.990 b.s.d.nimbus [INFO] [req 16] Access from:  principal: op:fileUpload
2015-11-19 09:42:02.993 b.s.d.nimbus [INFO] Finished uploading file from client: /var/storm/nimbus/inbox/stormjar-e7853d30-b798-4c5b-88cd-117cd82a99b0.jar
2015-11-19 09:42:03.009 b.s.d.nimbus [INFO] [req 17] Access from:  principal: op:submitTopology
2015-11-19 09:42:03.090 b.s.d.nimbus [INFO] Received topology submission for TOPOLOGY_NAME with conf {"topology.max.task.parallelism" nil, "topology.submitt$
2015-11-19 09:42:03.120 b.s.d.nimbus [INFO] nimbus file location:/var/storm/nimbus/stormdist/TOPOLOGY_NAME-1-1447926123
2015-11-19 09:42:03.150 b.s.d.nimbus [INFO] Activating TOPOLOGY_NAME: TOPOLOGY_NAME-1-1447926123
2015-11-19 09:42:03.245 b.s.s.EvenScheduler [INFO] Available slots: (["617c6b5f-628c-4b32-b2a3-123c164588d7" 6700] ["617c6b5f-628c-4b32-b2a3-123c164588d7" 6$
2015-11-19 09:42:03.284 b.s.d.nimbus [INFO] Setting new assignment for topology id TOPOLOGY_NAME-1-1447926123: #backtype.storm.daemon.common.Assignment{:mas$
2015-11-19 09:42:57.492 b.s.d.nimbus [INFO] [req 18] Access from:  principal: op:getNimbusConf
2015-11-19 09:42:57.507 b.s.d.nimbus [INFO] [req 19] Access from:  principal: op:getClusterInfo
2015-11-19 09:42:57.509 b.s.d.nimbus [INFO] [req 20] Access from:  principal: op:getClusterInfo
2015-11-19 09:42:57.517 b.s.d.nimbus [INFO] [req 21] Access from:  principal: op:getClusterInfo
2015-11-19 09:44:07.062 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[8 8] not alive
2015-11-19 09:44:07.064 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[12 12] not alive
2015-11-19 09:44:07.064 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[2 2] not alive
2015-11-19 09:44:07.064 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[7 7] not alive
2015-11-19 09:44:07.065 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[22 22] not alive
2015-11-19 09:44:07.065 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[3 3] not alive
2015-11-19 09:44:07.065 b.s.d.nimbus [INFO] Executor TOPOLOGY_NAME-1-1447926123:[1 1] not alive
+4
source share

Source: https://habr.com/ru/post/1616519/


All Articles