Streaming a binary file to Google storage using Tornado

Question

Streaming a binary file to Google storage using Tornado

I am trying to transfer a binary file from a client request to Google Cloud Storage through my server.

I use the Tornado framework for streaming data from a server request and the Google Cloud Storage API for streaming a file to Google upload_from_file.

I am new to Tornado and I use a decorator @stream_request_bodyso that I can get the data from the request in pieces and upload each piece to Google.

I tried to open a file-like object into which I can write each fragment when uploading the file itself to Google.

The problem is that I cannot upload the “file” to Google before I start writing pieces.

Any help would be greatly appreciated.

+4

python google-cloud-storage tornado stream file-upload

Liad Amsalem Jan 2 '18 at 15:43

source share

1 answer

Ben darnell · Accepted Answer · 2018-01-03T02:40:21+0000

Doing this with Google HTTP libraries is tricky because they are designed for synchronous use. You need to put the actual download in a different thread to avoid blocking IOLoop. You can use os.pipefor communication between the Tornado stream and the download stream (wrap the end of the recording in the tube in PipeIOStream and the end of reading in os.fdopen). Here's an untested solution sketch:

def prepare(self):
    r, w = os.pipe()
    self.write_pipe = tornado.iostream.PipeIOStream(w)
    # Create our "file-like object" for upload_from_file
    self.read_pipe = os.fdopen(r)
    # Create an event for the upload thread to communicate back
    # to tornado when it done, and save a reference to our IOLoop.
    self.upload_done = tornado.locks.Event()
    self.io_loop = tornado.ioloop.IOLoop.current()
    # Consider using a tornado.locks.Semaphore to limit the number of
    # threads you can create.
    self.thread = threading.Thread(target=self.upload_file)
    self.thread.start()

def upload_file(self):
    google_client.upload_from_file(self.read_pipe)
    # tell the IOLoop thread we're finished
    self.io_loop.add_callback(self.upload_done.set)

async def data_received(self, chunk):
    await self.write_pipe.write(chunk)

async def put(self):  # or post()
    self.write_pipe.close()
    await self.upload_done.wait()
    self.thread.join()
    self.render("upload_done.html")

Alternatively, you can avoid the synchronous Google libraries and do everything using the basic HTTP APIs and AsyncHTTPClient. Sorting authentication this way is complicated, but you avoid thread mismatch. This involves using body_producer, as in that sense

Streaming a binary file to Google storage using Tornado

More articles: