Weaving large files

I write a client-server application as follows: client (C #) ↔ server (scrollable, ftp proxy and additional functionality) ↔ ftp server

The server has two classes: my own protocol class, inherited from the LineReceiever protocol, and FTPClient from twisted.protocols.ftp.

But when the client sends or receives large files (10 Gb - 20 Gb), the server catches a MemoryError. I do not use any buffers in my code. This happens when, after a call, the transport.write (data) data is attached to the internal buffer of the authors of the reactor (correct me if I am wrong).

What should I use to avoid this problem? Or should I change the approach to the problem?

I found out that for large flows I have to use the interfaces IConsumer and IProducer. But finally, it will call the transfer.write method and the effect will be the same. Or am I wrong?

UPD:

Here is the logic of file upload / download (from ftp via Twisted server to a client on Windows):

The client sends some headers to the Twisted server and then starts sending the file. Twisted Server headers and after that (if necessary) call setRawMode() , open an ftp connection and receive / send bytes from / to the client and after all closed connections. Here is the part of the code that downloads the files:

FTPManager Class

 def _ftpCWDSuccees(self, protocol, fileName): self._ftpClientAsync.retrieveFile(fileName, FileReceiver(protocol)) class FileReceiver(Protocol): def __init__(self, proto): self.__proto = proto def dataReceived(self, data): self.__proto.transport.write(data) def connectionLost(self, why = connectionDone): self.__proto.connectionLost(why) 

The main class of the proxy server:

 class SSDMProtocol(LineReceiver) ... 

After the SSDMProtocol parsing headers (call obSSDMProtocol ), it calls a method that opens an ftp connection ( FTPClient from twisted.protocols.ftp ) and sets the FTPManager _ftpClientAsync field object and calls _ftpCWDSuccees(self, protocol, fileName) with protocol = obSSDMProtocol and when byte files Gets the dataReceived(self, data) a FileReceiver object.

And when self.__proto.transport.write(data) is called, the data is added to the internal buffer faster than sending back to the client, so the memory runs out. Maybe I can stop reading when the buffer reaches a certain size and resume reading after the buffer is sent to the client? or something like that?

+4
source share
1 answer

If you pass the string transport.write to 20 gigabytes?), You will need at least 20 gigabytes (gigabytes?) Of memory - probably more than 40 or 60 due to the need for additional copying when working with strings in Python.

Even if you never pass one line to transport.write , which is 20 gigabytes (gigabytes?), If you repeatedly call transport.write with short lines at a speed faster than your network can work, the send buffer will eventually grow too big to fit into memory and you will run into MemoryError .

The solution to both of these problems is a system of producers / consumers. The advantage is that using IProducer and IConsumer gives you that you will never have a 20 gigabyte line (gigabit?), And you will never fill up the send buffer with too many short lines. The network will be throttled so that the byte is not read faster than your application can handle them and forget about them. Your strings will be in the order of 16 KB to 64 KB, which should easily fit into memory.

You just need to configure using FileReceiver to enable the registration of the incoming connection as the manufacturer for the outgoing connection:

 class FileReceiver(Protocol): def __init__(self, outgoing): self._outgoing = outgoing def connectionMade(self): self._outgoing.transport.registerProducer(self.transport, streaming=True) def dataReceived(self, data): self._outgoing.transport.write(data) 

Now whenever self._outgoing.transport sends a buffer, it informs self.transport of the suspension. Once the send buffer is freed, it will tell self.transport resume. self.transport describes how to perform these actions at the TCP level so that the data coming to your server also slows down.

+14
source

Source: https://habr.com/ru/post/1439754/


All Articles