Work with lots of images on the fly with Django

Like a tile server for spatial image data, I want to view the many created images on the fly in my Django-based web application (image merging, color changing, etc.). Since a single client can request many (> 100) images in a short time, it is easy to cite a web server (Apache + mod_wsgi).

Therefore, I am looking for alternative ways. Since we already use Celery, it might be a good idea to do this image processing asynchronously and push the generated data to the client. To get started with this, I switched the WSGI server to gevent with Apache used as a proxy. However, I have not yet managed to get the push work to work, and I'm not quite sure if this is correct anyway. Based on this, I have three questions:

  • Do you think that this (Celery, gevent, Socket.IO) is a reasonable way to allow many clients to use the application without lowering the web server? Do you see alternatives?

  • If I pass the image processing to Celery and let him paste the image data into the browser when this is done, the connection will not go through Apache, right?

  • If some kind of click on the client is used, would it be better to use one connection or one for each image (and close it when it is done)?

Background:

The Django application I'm working on allows the user to display very large images. This is done by breaking up large images earlier and displaying only relevant fragments in the grid to the user. As far as I understand, this is a standard way of serving data in the field of mapping and spatial image data (for example, OpenStreetMap). But unlike the matching data, we also have many fragments in Z, the user can scroll (biological images).

All this works great when the tiles are statically served. Now I added the ability to generate these fragments "on the fly" - different images merge, colors are fixed, .... This works, but it is a big load for the web server, since one image takes about 0.1 s. We are currently using Apache with mod_wsgi (WSGIRestrictedEmbedded On), and it is easy to bring the server to failure. Just viewing the image stack will cause the web server to freeze. I already tried to configure MaxClients, etc. And disabled KeepAlive. I also tried different thread / process combinations for mod_wsgi. However, nothing helped to allow use for multiple users. So I thought the Comet / WebSocket method might help here.

+4
source share
3 answers

All this works great when the tiles are statically served. Now I added the ability to generate these fragments "on the fly" - different images merge, color is fixed, .... This works, but it is a heavy load for the web server, since one image takes about 0.1 s.

You need a load balancer in which image requests are sent to an external server (for example, NginX), which will multiplex (and cache!) As many requests as necessary, provided that you provide enough servers to do the heavy work.

This looks like a classic case for Amazon's distributed computing: you can store tiles in S3 storage (or perhaps NFS via EBS). All image processing servers retrieve data from a single image repository.

At the beginning, you can have both a web application and one instance of the image manipulation server on one computer. But basically your processes are three:

  • A web service that calculates image URLs (you need to somehow encode manipulations as parameters in URLs, otherwise you will have to use cookies and session storage, which is more dangerous)
  • an image server that receives an image formula and provides a JPEG tile
  • file server that allows access to large images or single original fragments

I worked on several such architectures in which our image layers were saved in one image file (for example, five zoom levels, each fifteen channels from FIR to UV, a total of 75 "images" up to 100 KB pixels per client can request " Scale level 2, red channel plus double difference between UV-1 channel and green, tiles from X = 157, Y = 195 to X = 167, Y = 205 ').

+1
source

If one user is all you need to bring your web server, the problem is not apache or mod_wsgi.

First you need to optimize your routines and see if you really deliver the data that the user really sees.

After that, a faster processor, more ram, ssd and aggressive caching will give you more performance.

Finally, you can get extra points for using another web server, but do not expect too much from this.

0
source

I am in a similar situation now, and this is the approach that I am implementing right now. Did you think you repel image manipulation to the client? I see that you use PIL for image management, but if the PIL commands are not too involved, can you recreate the functionality in Javascript? There are many things that can be used with the canvas, and in my situation, I was able to create the necessary images in Javascript on the toDataURL canvas to load it to the right places.

0
source

Source: https://habr.com/ru/post/1433789/


All Articles