There were other questions and answers on this site suggesting that to create an echo or delay effect, you only need to add one sound sample with a saved sound sample from the past. So I have the following Java class:
public class DelayAMod extends AudioMod { private int delay = 500; private float decay = 0.1f; private boolean feedback = false; private int delaySamples; private short[] samples; private int rrPointer; @Override public void init() { this.setDelay(this.delay); this.samples = new short[44100]; this.rrPointer = 0; } public void setDecay(final float decay) { this.decay = Math.max(0.0f, Math.min(decay, 0.99f)); } public void setDelay(final int msDelay) { this.delay = msDelay; this.delaySamples = 44100 / (1000/this.delay); System.out.println("Delay samples:"+this.delaySamples); } @Override public short process(short sample) { System.out.println("Got:"+sample); if (this.feedback) {
It takes one 16-bit sample at a time from the input stream, finds an earlier sample, and combines them accordingly. However, the output is simply awful, noisy static, especially when the decomposition rises to a level that will actually lead to a noticeable result. Decreasing the decomposition to 0.01 barely allows the original sound to pass, but there is no echo at this point.
Key troubleshooting facts:
- The sound stream sounds great if this processing is skipped.
- The sound stream sounds normal if the decomposition is 0 (do not add anything).
- Stored samples are indeed stored and available in the correct order and at appropriate places.
- Saved samples are decomposed and correctly added to the input samples.
- All numbers from calling
process() to return sample are exactly what I expect from this algorithm, and remain so even outside this class.
The problem seems to be related to the simple addition of signed shorts together, and the resulting waveform is an absolute disaster. I saw this particular method implemented in different places - C #, C ++, even on microcontrollers - so why is it so complicated here?
EDIT: It seems I did all this wrong. I do not know if this is FFmpeg / avconv or some other factor, but I am not working with a normal PCM signal. Thanks to the graphical representation of the waveform, as well as the unsuccessful attempt of the tone generator and the resulting analysis, I decided that this is some version of differential pulse code modulation; the step is determined by the change from one sample to another, and halving the estimated βvolumeβ factor on a pure sinusoidal wave actually reduces the step and leaves the volume the same. (Messing with a volume multiplier in a sequence other than sine creates the same static as this echo algorithm). Since this and other DSP algorithms are designed to work with linear pulse-code modulation, I will need some way to get the right audio stream.
source share