How to speed up the animation of choropleth Basemap Chapopleth

Taking ideas from various sources and combined with my own, I sought to create animated maps showing the shading of countries based on some value in my data.

The basic process is as follows:

  • Run a database query to obtain a data set, indicating the country and time
  • Use pandas to do some data manipulation (sums, avgs, etc.).
  • Initialize the baseemap object, then load the Load external shapefile
  • Using the animation library , color the countries, one frame for each individual β€œtime” in the data set.
  • Save as gif or mp4 or something else

This works great. The problem is that it is very slow. I have potentially more than 100k time intervals (by several metrics) that I want to revive, and I get an average time of 15 seconds to generate each frame, and it gets worse the more frames. At this speed, it could potentially take several weeks to maximize the processor and memory on my computer to create a separate animation.

I know that matplotlib is not known very quickly (examples: 1 and 2 ) But I read the stories of people generating animations at 5+ fps, and I wonder what I'm doing wrong.

Some optimizations I made:

  • Only repaint the countries in the animation function. This takes an average of ~ 3 s per frame, so although it could be improved, this is not something that takes the most time.
  • I am using the blit option.
  • I tried using smaller parcel sizes and a less detailed base map, but the results were not significant.

Perhaps a less detailed shapefile will speed up the coloring of shapes, but, as I said earlier, this improvement is only 3 s per frame.

Here is the code (minus a few identifiable functions)

import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import matplotlib.animation as animation import time from math import pi from sqlalchemy import create_engine from mpl_toolkits.basemap import Basemap from matplotlib.patches import Polygon from matplotlib.collections import PatchCollection from geonamescache import GeonamesCache from datetime import datetime def get_dataset(avg_interval, startTime, endTime): ### SQL query # Returns a dataframe with fields [country, unixtime, metric1, metric2, metric3, metric4, metric5]] # I use unixtime so I can group by any arbitrary interval to get sums and avgs of the metrics (hence the param avg_interval) return df # Initialize plot figure fig=plt.figure(figsize=(11, 6)) ax = fig.add_subplot(111, axisbg='w', frame_on=False) # Initialize map with Robinson projection m = Basemap(projection='robin', lon_0=0, resolution='c') # Load and read shapefile shapefile = 'countries/ne_10m_admin_0_countries' m.readshapefile(shapefile, 'units', color='#dddddd', linewidth=0.005) # Get valid country code list gc = GeonamesCache() iso2_codes = list(gc.get_dataset_by_key(gc.get_countries(), 'fips').keys()) # Get dataset and remove invalid countries # This one will get daily aggregates for the first week of the year df = get_dataset(60*60*24, '2016-01-01', '2016-01-08') df.set_index(["country"], inplace=True) df = df.ix[iso2_codes].dropna() num_colors = 20 # Get list of distinct times to iterate over in the animation period = df["unixtime"].sort_values(ascending=True).unique() # Assign bins to each value in the df values = df["metric1"] cm = plt.get_cmap('afmhot_r') scheme= cm(1.*np.arange(num_colors)/num_colors) bins = np.linspace(values.min(), values.max(), num_colors) df["bin"] = np.digitize(values, bins) - 1 # Initialize animation return object x,y = m([],[]) point = m.plot(x, y,)[0] # Pre-zip country details and shap objects zipped = zip(m.units_info, m.units) tbegin = time.time() # Animate! This is the part that takes a long time. Most of the time taken seems to happen between frames... def animate(i): # Clear the axis object so it doesn't draw over the old one ax.clear() # Dynamic title fig.suptitle('Num: {}'.format(datetime.utcfromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S')), fontsize=30, y=.95) tstart = time.time() # Get current frame dataset frame = df[df["unixtime"]==i] # Loop through every country for info, shape in zipped: iso2 = info['ISO_A2'] if iso2 not in frame.index: # Gray if not in dataset color = '#dddddd' else: # Colored if in dataset color = scheme[int(frame.ix[iso2]["bin"])] # Get shape info for country, then color on the ax subplot patches = [Polygon(np.array(shape), True)] pc = PatchCollection(patches) pc.set_facecolor(color) ax.add_collection(pc) tend = time.time() #print "{}%: {} of {} took {}s".format(str(ind/tot*100), str(ind), str(tot), str(tend-tstart)) print "{}: {}s".format(datetime.utcfromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S'), str(tend-tstart)) return None # Initialize animation object output = animation.FuncAnimation(fig, animate, period, interval=150, repeat=False, blit=False) filestring = time.strftime("%Y%m%d%H%M%S") # Save animation object as m,p4 #output.save(filestring + '.mp4', fps=1, codec='ffmpeg', extra_args=['-vcodec', 'libx264']) # Save animation object as gif output.save(filestring + '.gif', writer='imagemagick') tfinish = time.time() print "Total time: {}s".format(str(tfinish-tbegin)) print "{}s per frame".format(str((tfinish-tbegin)/len(df["unixtime"].unique()))) 

PS I know that the code is messy and can use some cleanup. I am open to any suggestions, especially if this cleaning will increase productivity!

Edit 1: Here is a sample output

 2016-01-01 00:00:00: 3.87843298912s 2016-01-01 00:00:00: 4.08691620827s 2016-01-02 00:00:00: 3.40868711472s 2016-01-03 00:00:00: 4.21187019348s Total time: 29.0233821869s 9.67446072896s per frame 

The first first few lines represent the processed date and time of execution of each frame. I do not know why the first is repeated. The last line is the total duration of the program divided by the number of frames. Please note that the average time is 2-3 times separately. It makes me think that something happens between "frames that take up a lot of time."

Edit 2: I conducted several performance tests and determined that the average generation time of each additional frame is longer than the last, proportional to the number of frames, which indicates that it is a quadratic time process. (or will it be exponential?) In any case, I am very confused why this will not be linear. If a data set has already been generated and the cards take a constant time to regenerate, which variable makes each additional frame take longer than the previous one?

Editing 3: I just realized that I don’t know how the animation function works. (X, y) and point variables were taken from an example that simply sketched moving points, so it makes sense in this context. Map ... not so much. I tried to return the card associated with the animation function and got better performance. The return object ax ( return ax, ) forces the procedure to execute in linear time ... but writes nothing to the gif. Does anyone know what I need to return from the animation function to make this work?

Editing 4: Cleaning the axis of each frame allows frames to generate at a constant speed! Now I just have to work on general optimization. First, I'll start with the ImportanceOfBeingErnest clause. Previous changes are outdated.

+6
source share

Source: https://habr.com/ru/post/1013961/


All Articles