Maybe I'm wrong about that, but as far as I know, you have 2 ways to get the spectrum of the whole song.
1) Make one FFT on the whole song, which will give you a very good resolution in frequency, but in practice it is inefficient, and you still do not need such resolution.
2) Divide it into small pieces (for example, 4096 blocks of samples, as you said), get the FFT for each of them and average the spectra. You will compromise the frequency resolution, but make the calculation more manageable (and also reduce the variance of the spectrum). The Wilhelmsen link describes how to compute FFT in C ++, and I think there is some library for this, like FFTW (but I never managed to compile it to be fair =)).
To get the amplitude spectrum, average the energy (squared magnitude) over all the pieces for each individual bin. To get the result in dB, only 10 * log10 - the results. This, of course, assumes that you are not interested in the phase spectrum. I think this is called the Barlett method .
I would do something like this:
Hope this answers your question.
Edit: Goz post will give you a lot of information on this =)
source share