Use STFT with overlapping windows to evaluate the spectrogram. To rid yourself of the need yourself, you can use the specgram method in Matplotlib mlab. It is important to use a sufficiently small window for which the sound is approximately stationary, and the buffer size must be 2 in order to effectively use the common radix-2 frame. 512 samples are sufficient (about 10.67 ms at 48 kbps, or 93.75 Hz per hopper). For a sampling frequency of 48 kbit / s, 464 samples overlap to evaluate a sliding window every 1 ms (i.e., a shift of 48 samples).
Edit:
Here is an example that mlab.specgram uses for an 8 second signal that has 1 tone per second from 2 kHz to 16 kHz. Pay attention to the answer on transients. I enlarged the image by 4 seconds to show the answer in more detail. The frequency shifts exactly after 4 seconds, but a buffer length is required to complete the transient (512 samples, approximately +/- 5 ms). This illustrates the kind of spectral / temporal blur caused by non-stationary transitions when passing through a buffer. In addition, you can see that even with a stationary signal, there is a spectral leakage problem caused by the data window. The Hamming function was used to minimize side leakage lobes, but it also extends the main lobe.

import numpy as np from matplotlib import mlab, pyplot
source share