Taking ideas from various sources and combined with my own, I sought to create animated maps showing the shading of countries based on some value in my data.
The basic process is as follows:
- Run a database query to obtain a data set, indicating the country and time
- Use pandas to do some data manipulation (sums, avgs, etc.).
- Initialize the baseemap object, then load the Load external shapefile
- Using the animation library , color the countries, one frame for each individual βtimeβ in the data set.
- Save as gif or mp4 or something else
This works great. The problem is that it is very slow. I have potentially more than 100k time intervals (by several metrics) that I want to revive, and I get an average time of 15 seconds to generate each frame, and it gets worse the more frames. At this speed, it could potentially take several weeks to maximize the processor and memory on my computer to create a separate animation.
I know that matplotlib is not known very quickly (examples: 1 and 2 ) But I read the stories of people generating animations at 5+ fps, and I wonder what I'm doing wrong.
Some optimizations I made:
- Only repaint the countries in the animation function. This takes an average of ~ 3 s per frame, so although it could be improved, this is not something that takes the most time.
- I am using the blit option.
- I tried using smaller parcel sizes and a less detailed base map, but the results were not significant.
Perhaps a less detailed shapefile will speed up the coloring of shapes, but, as I said earlier, this improvement is only 3 s per frame.
Here is the code (minus a few identifiable functions)
import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import matplotlib.animation as animation import time from math import pi from sqlalchemy import create_engine from mpl_toolkits.basemap import Basemap from matplotlib.patches import Polygon from matplotlib.collections import PatchCollection from geonamescache import GeonamesCache from datetime import datetime def get_dataset(avg_interval, startTime, endTime):
PS I know that the code is messy and can use some cleanup. I am open to any suggestions, especially if this cleaning will increase productivity!
Edit 1: Here is a sample output
2016-01-01 00:00:00: 3.87843298912s 2016-01-01 00:00:00: 4.08691620827s 2016-01-02 00:00:00: 3.40868711472s 2016-01-03 00:00:00: 4.21187019348s Total time: 29.0233821869s 9.67446072896s per frame
The first first few lines represent the processed date and time of execution of each frame. I do not know why the first is repeated. The last line is the total duration of the program divided by the number of frames. Please note that the average time is 2-3 times separately. It makes me think that something happens between "frames that take up a lot of time."
Edit 2: I conducted several performance tests and determined that the average generation time of each additional frame is longer than the last, proportional to the number of frames, which indicates that it is a quadratic time process. (or will it be exponential?) In any case, I am very confused why this will not be linear. If a data set has already been generated and the cards take a constant time to regenerate, which variable makes each additional frame take longer than the previous one? 
Editing 3: I just realized that I donβt know how the animation function works. (X, y) and point variables were taken from an example that simply sketched moving points, so it makes sense in this context. Map ... not so much. I tried to return the card associated with the animation function and got better performance. The return object ax ( return ax,
) forces the procedure to execute in linear time ... but writes nothing to the gif. Does anyone know what I need to return from the animation function to make this work?
Editing 4: Cleaning the axis of each frame allows frames to generate at a constant speed! Now I just have to work on general optimization. First, I'll start with the ImportanceOfBeingErnest clause. Previous changes are outdated.