Understanding ReplayGain

I am trying to implement ReplayGain. The specification is here .

Here or here is the existing implementation (in fact, all the implementations that I found somewhere seem to be derived from this code, for example, mp3gain, r128gain, flac replaygain, ...).

I think I understand the Yulewalk filter and the Butterworth filter. As for calculating RMS.

The section on statistical processing remains a bit unclear to me, especially. what exactly is meant by

The value that most closely matches a person’s perception of perceived loudness is 95%

From the above implementation, this becomes a little clearer. But either I get something wrong, or the implementation is buggy. There is a formal val = STEPS_per_dB * 10. * log10 ( (lsum+rsum) / totsamp * 0.5 + 1.e-37 ) . (lsum+rsum) / totsamp * 0.5 should always be in [0,1] (this should be the average average of the squares, all of which are in [0,1]). So, it seems to me that this is always <0. But from the future code this does not make sense.


Edit: I was wrong. All samples were not in (float) [- 1,1], as I thought, but in (float) [- 0x8000,0x7fff], i.e. Simple conversion with direct conversion sint16->. I got this from the implementation; I really do not see where this is explicitly stated in the specification. In addition, formulas, while they give numbers that make more sense, seem more composed. At least I don’t see where this formula comes from. Can someone explain?

For the gap, 10. * log10( (lsum+rsum) / totsamp * 0.5 + 1.e-37) actually (basically) matches 20. * log10(rms) . This is similar to what you see in the spec, but not quite because of L_{pp} . In the specification:

L = 20 * log10( 2 * rms / L_{pp} )

What is this L_{p−p} (maximum range from peak to peak in the audio file)? In the implementation, this seems 2 (otherwise you will not get 20. * log10(rms) ). Initially, I thought it was the range of the range that the samples are in (if it is [-1,1], it is 2), but I was not there, so this cannot be.

Then, also in the code, there is a constant

 #define PINK_REF 64.82 /* 298640883795 */ /* calibration value for 89dB */ 

Where does this value come from?


Edit : my current implementation: here

+4
source share

Source: https://habr.com/ru/post/1437110/


All Articles