Interpreting a .WAV File [Python]

I am trying to process an audio file in python and apply a Low Pass filter to remove some background noise. Currently, I can successfully upload the file and create an array with its data values:

class AudioModule: def __init__(self, fname=""): self.stream = wave.open(fname, 'r') self.frames = [] def build(self): self.stream.rewind() for x in range(self.stream.getnframes()): self.frames.append(struct.unpack('B',self.stream.readframes(1))) 

I used struct.unpack ('B' ..) for this particular file. The downloadable audio file displays the following specifications:

 nchannels: 1 sampwidth: 1 framerate: 6000 

I know that sampwidth indicates the width in bytes returned by each readframes (1) call. When loading an array, it contains values โ€‹โ€‹as shown (from 128 to 180):

 >>> r.frames[6000:6025] [(127,), (127,), (127,), (127,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,)] 

Question: What are these numbers? Other audio files with a larger sample width give completely different numbers. My goal is to trim certain frequencies from the audio file, unfortunately, I know very little about this and donโ€™t know how these values โ€‹โ€‹relate to the frequency.

What are the best ways to remove all values โ€‹โ€‹above a certain frequency threshold?

In addition, the values โ€‹โ€‹are packaged back into another file as follows:

 def store(self, fout=""): out = wave.open(fout, 'w') nchannels = self.stream.getnchannels() sampwidth = self.stream.getsampwidth() framerate = self.stream.getframerate() nframes = len(self.frames) comptype = "NONE" compname = "not compressed" out.setparams((nchannels, sampwidth, framerate, nframes, comptype, compname)) if nchannels == 1: for f in self.frames: data = struct.pack('B', f[0]) out.writeframes(data) elif nchannels == 2: for f in self.frames: data = struct.pack('BB', f[0], f[1]) out.writeframes(data) out.close() 
+4
source share
1 answer

I think the numbers are abstract extensions of the vibration of the membrane or volume. A higher value means greater vibration of the membrane. You can read it here.

And the sample width is a range of volume. For different types of samples, the sample widths are different. For example, if the sample width is 1 bit, we can describe the sound as sound or not. Thus, usually a higher sampling width, the sound has a higher quality. For more information on sample widths, you can read Sampling Rate and Bitrate: Gut Digital Sound .

And the singles stored in the audio file are in the time domain. It does not represent frequency. If you want to get values โ€‹โ€‹in the frequency domain, you can perform FFT in the resulting array.

I recommend using numpy to perform audio. For example, to get the required array, you just need to use np.fromstring . And related functions like FFT are already defined. Many samples and documents can be found on Google.

+2
source

Source: https://habr.com/ru/post/1491653/


All Articles