I have a simple pub-sub setup on a medium sized network using ZMQ 2.1. Although some subscribers use C # bindings, others use Python bindings, and the problem I came across is the same for.
If I pull the network cable out of the machine the subscriber is working on, I get a fatal error that immediately stops the subscription.
Here is a very simple Python subscriber example (not the actual production code, but enough to reproduce the problem):
import zmq def main(server_address, port): context = zmq.Context() sub_socket = context.socket(zmq.SUB) sub_socket.connect("tcp://" + server_address + ":" + str(port)) sub_socket.setsockopt(zmq.SUBSCRIBE, "KITH1S2") while True: msg = sub_socket.recv() print msg if __name__ == "__main__": main("company-intranet", 4000)
In C #, a program simply ends silently. In Python, at least I get the following:
Assertion failed: rc == 0 (.... \ src \ zmq_connector.cpp: 48)
This application asked Runtime to terminate it in an unusual way. For more information, contact support.
I tried non-blocking versions and poller versions, but anyway this problem with instant termination persists. Is there something obvious that I have to do, but I don't? (That is, obviously to someone else :)).
EDIT:
Found the following: https://zeromq.jira.com/browse/LIBZMQ-207
This seems to be / was a known issue.
Link to other links on Github, where in the changelog for 2.1.10 there is this note:
- Fixed problem 207, approval error in zmq_connecter.cpp: 48, when the incorrect string zmq_connect () is used, or the host name cannot be resolved. Now zmq_connect () call returns -1 in both cases.
Although connect () does throw an Invalid Argument exception in Python (and not C #, apparently?), Recv () still fails. If the subscriber machine suddenly loses the network, this subscriber simply ceases to function.
So - I will try to use IP addresses instead of the named addresses to see if the problem is resolved. Not perfect, but better than insta-crash.