UDP Invalid Argument for Docker Swarm EC2

When sending UDP packets to EC2 with Docker, I sometimes get this strange error (not all sent messages have an exception), which never happen in our cluster using OpenNebula. I allowed all inbound / outbound traffic on each port in all EC2 instances. Here is the exception:

2017-01-19 10:01:53,170 - ERROR: Exception caught for address: 10.99.0.153 Traceback (most recent call last): File "./server.py", line 56, in <module> sock.sendto(bytes('{}'.format(i), "utf-8"), (address, PORT)) OSError: [Errno 22] Invalid argument 

I am running 5 instances of c4.xlarge with a Ubuntu 16.04 server and Docker 1.12.6. All of them are in the same docker.

I am creating a service and a network subnet using an overlay driver. This service has a mount point for receiving logs from each peer. I am running 150 peers, each of which has a 300 MB memory limit.

My dockerfile:

 FROM debian:jessie RUN echo 'deb http://mirror.switch.ch/ftp/mirror/debian/ jessie-backports main' >> /etc/apt/sources.list && \ apt-get -yqq update && \ apt-get -yqq dist-upgrade && \ apt-get -yqq install --no-install-recommends dnsutils wget curl ntp python3 && \ apt-get -yqq clean CMD ["/opt/epto/container-start-script.sh"] 

I use the following shell script as my CMD:

 #!/usr/bin/env bash MY_IP_ADDR=$(/bin/hostname -i) MY_IP_ADDR=($MY_IP_ADDR) ./server.py ${MY_IP_ADDR[0]} 

And this is the actual launch of the python script:

 #!/usr/bin/env python3 import socketserver import sys import logging import threading import urllib.request import time import socket from random import randint PORT = 15342 class MyUDPHandler(socketserver.BaseRequestHandler): """ This class works similar to the TCP handler class, except that self.request consists of a pair of data and client socket, and since there is no connection the client address must be given explicitly when sending data back via sendto(). """ def handle(self): data = self.request[0].strip().decode("utf-8") logging.info("Message received from {} during loop {}".format(self.client_address[0], data)) class ThreadedUDPServer(socketserver.ThreadingMixIn, socketserver.UDPServer): pass if __name__ == "__main__": HOST = sys.argv[1] logging.basicConfig(format='%(asctime)s - %(levelname)s: %(message)s', level=logging.INFO, filename='/data/{}.test'.format(HOST)) server = ThreadedUDPServer((HOST, PORT), MyUDPHandler) server.allow_reuse_address = True logging.info("Create server listening on {}:{}".format(HOST, PORT)) logging.info("Server allow_reuse_address: {}".format(server.allow_reuse_address)) server_thread = threading.Thread(target=server.serve_forever) server_thread.daemon = True server_thread.start() sleep_delay = randint(10, 180) logging.info("Sleeping for {}s".format(sleep_delay)) time.sleep(sleep_delay) logging.info("Finished sleeping") sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) content = urllib.request.urlopen('http://epto-tracker:4321/REST/v1/admin/get_view').read() content = content.decode("utf-8") addresses = content.split('|') logging.info("View size: {}".format(len(addresses))) i = 0 while True: logging.info("Loop {}".format(i)) for address in addresses: try: logging.info("Sending to {}".format(address)) sock.sendto(bytes('{}'.format(i), "utf-8"), (address, PORT)) except: logging.exception("Exception caught for address: {}".format(address)) time.sleep(5) i += 1 

I am creating a second service on the same overlay network. It contains a tracker with which the nodes will contact to get an idea on the network:

Dockerfile:

 FROM python:3.5.2-alpine RUN pip install pydevd COPY tracker.py /code/ WORKDIR /code EXPOSE 4321 CMD [ "python", "./tracker.py" ] 

code file:

 # import pydevd import random import logging import time from http.server import HTTPServer, BaseHTTPRequestHandler available_peers = {} K = 25 logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.INFO) def florida_string(ip): available_peers[ip] = int(time.time()) to_choose = list(available_peers.keys()) logging.info("View size: {:d}".format(len(to_choose))) to_choose.remove(ip) if len(to_choose) > K: to_send = random.sample(to_choose, K) else: to_send = to_choose return '|'.join(to_choose).encode() class FloridaHandler(BaseHTTPRequestHandler): def do_GET(self): if self.path == '/REST/v1/admin/get_view': self.send_response(200) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(florida_string(self.client_address[0])) elif self.path == '/terminate': if self.client_address[0] in available_peers: del available_peers[self.client_address[0]] logging.info("Removed {:s}".format(self.client_address[0])) logging.info("View size: {:d}".format(len(available_peers))) else: logging.error("IP already removed or was never here") self.send_response(200) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(b"Success") else: self.send_response(404) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(b"Nothing here, content is at /REST/v1/admin/get_view\n") class FloridaServer: def __init__(self): self.server = HTTPServer(('', 4321), FloridaHandler) self.server.serve_forever() FloridaServer() 

Has anyone experienced the same error on EC2?

+6
source share

Source: https://habr.com/ru/post/1014212/


All Articles