As for the first part of the question, the former physics professor Bartosz Milewski has a very nice explanation of what FFT is and how it works.
In addition, it is worth mastering the Fourier transform in one day .
In English (?)
Say you have sound from the speaker.
Then you tune, let's get a round number here, 1024 harmonic oscillators that resonate in specific frequency ranges.
Play a sound, say, a second.
Oscillators begin to resonate with the sound coming from the speaker. After that second, you read how each oscillator resonates. As a result, you get a discrete Fourier transform, that is, you get a diagram of how each of the frequency ranges contributed to the sound coming from the speaker.
Instead of visualizing sound as the amount of air pressure caused by the waveform, changing the time intervals, you visualized it as a series of intensities of frequency ranges.
Of course, explaining DFT, some speakers are not very suitable, since you have to work with sampled input. Thus, in this case, 1024 digital βgeneratorsβ should actually be measured after 1/44 of a second, given that sound is sampled at a speed of 44 kHz.
Fast Fourier Transform is an algorithm for performing a discrete Fourier transform, which is pretty easy for computers to run on an incoming signal. It imposes some restrictions that you have to work with in your implementation (for example, the number of samples must be 2), because it uses some smart tricks to drastically reduce the number of calculations performed in the sample buffer.
In fact, there is no need to go deeper, because the two links I gave give a fairly clear explanation. And note that it is impossible to move from theory to implementation without knowing the mathematics behind it.
I hope this introduction makes sense!