Reading * .wav files in Python

I need to analyze the sound recorded in a WAV file. To do this, I need to convert this file to a set of numbers (e.g. arrays). I think I need to use a wave packet. However, I do not know how exactly this works. For example, I did the following:

import wave w = wave.open('/usr/share/sounds/ekiga/voicemail.wav', 'r') for i in range(w.getnframes()): frame = w.readframes(i) print frame 

As a result of this code, I expected to see sound pressure as a function of time. On the contrary, I see many strange, mysterious characters (which are not hexadecimal numbers). Can anyone please help me with this?

+46
python audio wav wave
Jan 13 '10 at
source share
9 answers

Per sources , scipy.io.wavfile.read(somefile) returns a tuple of two elements: the first is the sampling rate in samples per second, the second is a numpy with all the data read from the file. Looks pretty easy to use!

+45
Jan 13 '10 at 23:44
source share

I did some research this evening and realized this:

 import wave, struct waveFile = wave.open('sine.wav', 'r') length = waveFile.getnframes() for i in range(0,length): waveData = waveFile.readframes(1) data = struct.unpack("<h", waveData) print(int(data[0])) 

Hope this snippet helps someone. Details: using the structural module, you can take wave frames (which are in 2s complementary binary format between -32768; 0x8000 and 32767; 0x7FFF). It reads MONO, 16-BIT, WAVE. I found this web page quite useful in formulating this.

+42
Mar 12 '11 at 7:21
source share

Various python modules for reading wav:

At least these following libraries read the sound file:

The simplest example:

This is a simple Pysoundfile example:

 import soundfile as sf data, samplerate = sf.read('existing_file.wav') 



Output format:

Warning. The data is not always in the same format that depends on the library. For example:

 from scikits import audiolab from scipy.io import wavfile from sys import argv for filetest in argv[1:]: [x, fs, nbBits] = audiolab.wavread(filePath) print '\nReading with scikits.audiolab.wavread: ', x [fs, x] = wavfile.read(filetest) print '\nReading with scipy.io.wavfile.read: ', x 

Reading using scikits.audiolab.wavread: [0. 0. 0 ...., -0.00097656 -0.00079346 -0,00097656] Reading with scipy.io.wavfile.read: [0 0 0 ..., -32 -26 -32]

PySoundFile and Audiolab return a float between -1 and 1 (since matab does this, this is the convention for the audio signal). Scipy and wave return integers that can be converted to float according to the number of coding bits.

For example:

 from scipy.io.wavfile import read as wavread [samplerate, x] = wavread(audiofilename) # x is a numpy array of integer, representing the samples # scale to -1.0 -- 1.0 if x.dtype == 'int16': nb_bits = 16 # -> 16-bit wav files elif x.dtype == 'int32': nb_bits = 32 # -> 32-bit wav files max_nb_bit = float(2 ** (nb_bits - 1)) samples = x / (max_nb_bit + 1.0) # samples is a numpy array of float representing the samples 
+11
Nov 03 '14 at 14:13
source share

IMHO, the easiest way to get audio data from a sound file into a NumPy array is PySoundFile :

 import soundfile as sf data, fs = sf.read('/usr/share/sounds/ekiga/voicemail.wav') 

It also supports 24-bit files out of the box.

There are many sound files available, I wrote a review where you can see several pros and cons. It also has a page explaining how to read a 24-bit wav file with the wave module .

+10
Sep 17 '15 at 12:09 on
source share

You can accomplish this using the scikits.audiolab module. It requires the NumPy and SciPy functions, as well as libsndfile.

Notice, I managed to get it to work on Ubunutu, not OSX.

 from scikits.audiolab import wavread filename = "testfile.wav" data, sample_frequency,encoding = wavread(filename) 

You now have wav data

+8
Jun 17 '11 at 10:10
source share

If you want to redo the audio lock, some of these solutions are quite terrible in the sense that they mean loading all the audio into memory, which leads to numerous cache misses and slowdown of your program. python-wavefile provides some pythonic constructors for performing NumPy block processing using efficient and transparent block management with generators. Other pythonic subtleties is the context manager for files, metadata as properties ... and if you want the whole file interface because you are developing a quick prototype and you don't care about efficiency, the whole file interface still exists.

A simple processing example would be:

 import sys from wavefile import WaveReader, WaveWriter with WaveReader(sys.argv[1]) as r : with WaveWriter( 'output.wav', channels=r.channels, samplerate=r.samplerate, ) as w : # Just to set the metadata w.metadata.title = r.metadata.title + " II" w.metadata.artist = r.metadata.artist # This is the prodessing loop for data in r.read_iter(size=512) : data[1] *= .8 # lower volume on the second channel w.write(data) 

The example repeats the same block to read the entire file, even in the case of the last block, which is usually less than the required one. In this case, you get a piece of the block. Therefore, trust the returned block length instead of using a fixed size of 512 for any subsequent processing.

+2
Sep 16 '14 at 9:54 on
source share

If you are going to make transfers according to the waveform data, then perhaps you should use SciPy , in particular scipy.io.wavfile .

+1
Jan 13 '10 at 22:11
source share

if its two files and sampling rate are significantly high, you can simply alternate them.

 from scipy.io import wavfile rate1,dat1 = wavfile.read(File1) rate2,dat2 = wavfile.read(File2) if len(dat2) > len(dat1):#swap shortest temp = dat2 dat2 = dat1 dat1 = temp output = dat1 for i in range(len(dat2)/2): output[i*2]=dat2[i*2] wavfile.write(OUTPUT,rate,dat) 
0
Aug 23 '13 at 16:51
source share

I needed to read a 1-channel 24-bit WAV file. The post above Nak was very helpful. However, as mentioned above, 24-bit basj is not simple. I finally started working using the following snippet:

 from scipy.io import wavfile TheFile = 'example24bit1channelFile.wav' [fs, x] = wavfile.read(TheFile) # convert the loaded data into a 24bit signal nx = len(x) ny = nx/3*4 # four 3-byte samples are contained in three int32 words y = np.zeros((ny,), dtype=np.int32) # initialise array # build the data left aligned in order to keep the sign bit operational. # result will be factor 256 too high y[0:ny:4] = ((x[0:nx:3] & 0x000000FF) << 8) | \ ((x[0:nx:3] & 0x0000FF00) << 8) | ((x[0:nx:3] & 0x00FF0000) << 8) y[1:ny:4] = ((x[0:nx:3] & 0xFF000000) >> 16) | \ ((x[1:nx:3] & 0x000000FF) << 16) | ((x[1:nx:3] & 0x0000FF00) << 16) y[2:ny:4] = ((x[1:nx:3] & 0x00FF0000) >> 8) | \ ((x[1:nx:3] & 0xFF000000) >> 8) | ((x[2:nx:3] & 0x000000FF) << 24) y[3:ny:4] = (x[2:nx:3] & 0x0000FF00) | \ (x[2:nx:3] & 0x00FF0000) | (x[2:nx:3] & 0xFF000000) y = y/256 # correct for building 24 bit data left aligned in 32bit words 

Some extra scaling is required if you need results between -1 and +1. Perhaps some of you may find this useful.

0
Jun 14 '15 at 17:01
source share



All Articles