Error starting topology in production cluster with Apache Storm 1.0.0, topology does not start

I have a topology that works well on a local cluster . But when I try to run it on a production cluster , the following happens:

  • Nimbus up
  • Street user interface
  • The workers I use
  • Zookeper up
  • I launch a storm with

    storm jar myjar.jar myclass

  • Nimbus Presents Topology

  • Topologies and workers appear in the storm interface

BUT:

A topology does not start even though its status is ACTIVE

The topology log file does not appear in production.

I have the following employee log in supervisor.log :

2016-04-15 13:18:19.831 oasdsupervisor [WARN] There was a connection problem with nimbus. #error { :cause jobs-rec-storm-nimbus :via [{:type java.lang.RuntimeException :message org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: jobs-rec-storm-nimbus :at [org.apache.storm.security.auth.TBackoffConnect retryNext TBackoffConnect.java 64]} {:type org.apache.storm.thrift.transport.TTransportException :message java.net.UnknownHostException: jobs-rec-storm-nimbus :at [org.apache.storm.thrift.transport.TSocket open TSocket.java 226]} {:type java.net.UnknownHostException :message jobs-rec-storm-nimbus :at [java.net.AbstractPlainSocketImpl connect AbstractPlainSocketImpl.java 184]}] :trace [[java.net.AbstractPlainSocketImpl connect AbstractPlainSocketImpl.java 184] [java.net.SocksSocketImpl connect SocksSocketImpl.java 392] [java.net.Socket connect Socket.java 589] [org.apache.storm.thrift.transport.TSocket open TSocket.java 221] [org.apache.storm.thrift.transport.TFramedTransport open TFramedTransport.java 81] [org.apache.storm.security.auth.SimpleTransportPlugin connect SimpleTransportPlugin.java 103] [org.apache.storm.security.auth.TBackoffConnect doConnectWithRetry TBackoffConnect.java 53] [org.apache.storm.security.auth.ThriftClient reconnect ThriftClient.java 99] [org.apache.storm.security.auth.ThriftClient <init> ThriftClient.java 69] [org.apache.storm.utils.NimbusClient <init> NimbusClient.java 106] [org.apache.storm.utils.NimbusClient getConfiguredClientAs NimbusClient.java 78] [org.apache.storm.utils.NimbusClient getConfiguredClient NimbusClient.java 41] [org.apache.storm.blobstore.NimbusBlobStore prepare NimbusBlobStore.java 268] [org.apache.storm.utils.Utils getClientBlobStoreForSupervisor Utils.java 462] [org.apache.storm.daemon.supervisor$fn__9590 invoke supervisor.clj 942] [clojure.lang.MultiFn invoke MultiFn.java 243] [org.apache.storm.daemon.supervisor$mk_synchronize_supervisor$this__9351$fn__9369 invoke supervisor.clj 582] [org.apache.storm.daemon.supervisor$mk_synchronize_supervisor$this__9351 invoke supervisor.clj 581] [org.apache.storm.event$event_manager$fn__8903 invoke event.clj 40] [clojure.lang.AFn run AFn.java 22] [java.lang.Thread run Thread.java 745]]} 2016-04-15 13:18:19.831 oasdsupervisor [INFO] Finished downloading code for storm id jobs-KafkaMigration-topology-3-1460740616 2016-04-15 13:18:19.850 oasdsupervisor [INFO] Missing topology storm code, so can't launch worker with assignment ...(some more numbers) 

So, I assume that I have a problem connecting to the halo, but the properties file in the working file:

  storm.zookeeper.servers: - "192.168.22.209" - "192.168.22.216" - "192.168.22.217" storm.local.dir: "/app/home/storm" storm.zookeeper.root: "/storm-prod" # nimbus.seeds: ["192.168.120.96"] 

And if I do ping to nimbus ip from workers, it will return OK

Where is the error, how can I fix it?

Thanks!

+5
source share
3 answers

It seems that in this context, the Storm supervisor resolves nimbus from what is configured in storm.yaml seed / host for the first time, and then uses the nimbus host name to load topology artifacts.

If this is correct, DNS is required to configure the cluster. This is far from ideal, especially when using containers in an organized environment such as tunnels.

The current workaround I'm using is adding

 storm.local.hostname: "<local.ip.value>" 

for storm.yaml

Thanks to @bastien for providing feedback on the storm users mailing list

+6
source

I ran into a similar problem. It turns out my firewall rules blocked the ports of the supervisor. Make sure the leader and the halo can talk to each other.

+2
source

I found that I need the hostnames to match the names that I called in the / etc / hosts file

in the host file i had

xxx.xxx.xxx.xxx nimbus

but the hostname in the field was different and he pulled the hostname from os

changing the hostname on os of the nimbus server resolved my problem.

0
source

Source: https://habr.com/ru/post/1247235/


All Articles