How to recognize if a sound sample has been compressed and then unpacked?

A few years ago I recorded music and I can’t find the original WAV files, I only have compressed MP3 files. Now I found an audio CD, but I don’t know if this was done using the original uncompressed WAV or if it was made from compressed MP3 or OGG files.

Is there a way to determine if an audio sample has been compressed and decompressed using lossy compression such as MP, OGG, ... without having the original to compare with?

Update:

After trying out the @MisterHenson suggestion, I plotted the spectra of two samples with obvious differences in the graphs:

Sample from CD:

enter image description here

Sample from MP3:

enter image description here


It practically solves, solves my current problem, but still I have these open questions:

  • If the spectra were visually indistinguishable, I would not know if there is a real difference or that I simply cannot distinguish them (that is, the compression will be of better quality). What else could I try?
  • Similarly, what would I do if I didn't have an MP3 file for comparison, just one sample audio?
  • Is there an automated method that would answer the question with reasonable probability?
+6
source share
5 answers

I made an example to emphasize the topology of all MP3 codes, the source material of which is Chopin's nocturne. MP3 on top, lossless at the bottom. All records have background noise of some amplitude, and this noise is poorly visible here. What the MP3 transcode (Lame V2 preset in this case) does creates a hard limit of ~ 16 kHz. At an MP3 sampling rate of 320 kbps - 44.1 kHz, this hard limit is displayed at about 20 kHz, but this image will still be noticeable.

Piano music example

You can select this shelf without having a lossless source file for comparison. I want to say that all music has an amplitude at frequencies above 19 kHz. Here is an example for which I do not have a lossless source file, just 320kbps MP3. You can see a very hard limit at 20 kHz, as well as softer cutoff at 19 kHz. If it were lossless, the red frame in the middle will expand to 22 kHz, since the sampling frequency is 44.1 kHz.

인피니트 - Back

I would say that this process is probably automated, but I do not know of any attempts to automate it. If it were automated, I would say that he could choose Lossy from Lossless with much greater accuracy than you or me, because he could analyze the entire spectrum, and not just the high cutoff frequencies.

Full-size images:

+3
source

The above approaches sound very promising, although maybe a little more complicated - you can try something easy first, for example, check the distribution of the least significant bit. In a natural sample, the LSB should be an almost accurate 50/50 distribution between zeros and ones (in fact, in many samples it will have some variance after the binomial distribution, but with millions or billions of bits it will be ridiculously close to 50/50 in any given sample). In the loss example, you will find an unlikely distribution in LSB.

Something like that:

1 - extract LSB from each data point

2 - apply a chi-square test to judge unusual distribution

+2
source

Here is the deal.

The raw sample (or raw sound) is encoded with a certain quality. Some sound cards can go further with 64-bit sampling.

But suppose we have sound files with a certain KNOWN quality.

CD quality is good for the human ear.

The studio, however, would use better samples. Like a 24-bit standard.

So, you have waveform filename.wav, which really has a sampling frequency of 44100 Hz.

What does it mean?

This means that a computer can receive a huge number of different samples per second to represent almost accurate sound.

Is the sound original? Depends on how it was done. If this was done by your computer and software using a 16-bit standard sound card, yes, it is.

If it were from an analogue recording, then it lost some of its quality when digitized at a frequency of 44100 Hz, fortunately, not so significant for the human ear. NOTE that mp3 recordings is a bad idea for professional recording. But since the mp3 recording does exist ... this adds complexity to your question .: P

Thus, the sound quality is lost when digitizing with a 16-bit sound card. Now this can happen when you encode something in mp3.

Check your photo. Above 17,000 there is no sound. This was done in order to make the sound file significantly smaller without causing significant damage to the sound quality. Is that the same sound? No. It looks like the same thing. But the sound engineer LOVES original and high-quality samples because of information that is NOT cut.

Imagine that I made an original sound so balanced and compressed that even after converting an MP3 it’s hard to say whether it is an original sound or not. Imagine using equalizers to cut any sharp edges and gate effects to normalize it. In addition, my sound generators are some 8-bit oscillators passing through some fx and filters.

If I convert it back to wavetable, there can be no difference.

For instance:

[UNCHANGED FREQUENCIES][CUT FREQUENCIES] Waveform: ================================= mp3: ======================= Waveform: ======================= Waveform: [UNCHANGED FREQUENCIES][CUT FREQUENCIES] Waveform: ================= mp3 ================= Waveform: ================= The following seems impossible to me (except if the converter has bugs thing that can be heard) [UNCHANGED FREQUENCIES][CUT FREQUENCIES] Waveform: ========================= mp3 ======================= Waveform: ============================= 

So your question depends on the original source that you used in the first signal.

The good news is that the RARELY THAT sample is limited and compressed. Therefore, it seems to me that the CD you are using is likely to sound like the original waveform, while, as you can see, mp3 cut out the frequencies.

Of course, you need a frequency analyzer and spectrum, as MischaNix has already shown.

There are many mp3 encodings. Some of them are static, some dynamic, some reduce more, and some reduce audio information. For this reason, some of them are also larger than others.

Now there are lossless formats. And then there is ogg, which is small enough and also has great quality.

Thus, this issue can become a huge topic for no reason. I will not talk about all these things.

If the problem gives the original sample, your photographs show me significant differences between the two samples. I mean, shaping a waveform from a variation of mp3 slicing should look like this has changed. You cannot get information from nothing.

Burn mp3 to a CD, then get a wave, compare the new waveform with the old and mp3 signal. It will probably not be the same thing, so you can hit the jackpot here. Perhaps you have the original backup at your fingertips.

From now on, try to select the source material and store them on a CD or DVD before discarding them. Or at least keep good uncompressed samples in a backup.

Open questions:

If the spectra were visually indistinguishable, I would not know if there is a real difference or that I simply cannot distinguish them.

Right. But this would rarely happen without sampling intent.

Why ask such a question? :) Do you have steganography? If so, be sure to remember the nature of the sound you are going to use. Samples do not fit. "Ready songs"!

Similarly, what would I do if I didn't have an MP3 file for comparison, just one sample audio?

Since there are many mp3 encoding settings of different qualities, you can check if the lowest quality has been used. If not, there is uncertainty due to compression capabilities. If this applies to the entire sample, you must ensure that compression is required. That is why you cannot be sure of the song. Firstly, you do not record with hard compression SO. I think this is another meta reason why you need natural sound. So if you are lucky about the record. Now about the completed mastered song ... everything becomes rude again. It's about nature, the type of sound. Recording is easier to understand what happens if you knew you were using waveform recording. Of course, recording an mp3 is a waste of time. On the other hand, a finished song, usually these days, burns compressors, limiters, gates and chain compressors. The volume of use of these methods in modern development is huge. So ... you really need luck to find out if the original part was compressed before you started the original waveform.

Is there an automated method that would answer the question with reasonable probability?

No, that I know. Sorry. :( But this does not mean that no one can do this.

BUT!

A stereo sample is usually divided into two channels. Left and right. Now, if you have a spectrum analyzer on a digital workstation, and look only at the left channels of two different samples, you can see on the fly if they are the same or not, I think.

To understand what I mean, see IT . Go at 05:00 and just watch the interface.

Phew Hope this helps you further, as it took some time.: P Greetings.

Edit: fixing some things here and there.

+2
source

I found a description of the problem, solution and implementation in Python from Maurits van der Schee , which works with FLAC.

Only the first 30 seconds are analyzed from the sample. For each second frequency spectrum of the sample is calculated by applying the Hanning Window and the fast Fourier transform. These spectra are added, so you end up with 30 added spectra. These are divided by 30 to get the average spectrum. Then the spectrum is normalized using log10. After that, we applied a moving average on the spectrum with a window size of 1 / 100th frequency, which is 44100/100 = 441 samples.

If there is an unnatural clipping in the frequency spectrum, this clipping is what we need to find. We select the spectrum from the 44100th back to the 1st frequency, where the variable frequency is f. As soon as the value at f-220 is greater than 1.25 more than the value at f, and the value at f is not greater than 1.1x, we found the cut-off value at 44100. The cut-off point is multiplied by 100 and divided by the frequency to get a percentage of the spectrum not cut off.

+2
source

What to look for:

  • Change the cutoff frequency at the borders of the frame (it will not be 100% hard, but search “audibly” to “inaudible” and vice versa).
  • Frequencies that disappear or appear at the borders of frames (again, not 100%)
  • Noise levels changing at frame boundaries (actually pretty solid for lossy codecs)

For MP3, frame boundaries are exactly every 1152 samples, although you can “see” granules every 576 samples.

For Vorbis, the frame boundary is usually 128 or 1024 samples, depending on the transients that the encoder has "seen." You will probably be able to complete every 128 samples ...

You will need to explore other formats to know the sizes of their frames (I don't know them carelessly).

+1
source

Source: https://habr.com/ru/post/973606/


All Articles