Python implementation of the MFCC algorithm

I have a database that contains streaming video. I want to calculate LBP functions from images and MFCC sound, and there is an annotation for each frame in the video. Annotations are embedded in video clips and video time. So I want to map the time that I have from the annotation to the mfcc result. I know sample_rate = 44100

from python_speech_features import mfcc from python_speech_features import logfbank import scipy.io.wavfile as wav audio_file = "sample.wav" (rate,sig) = wav.read(audio_file) mfcc_feat = mfcc(sig,rate) print len(sig) //2130912 print len(mfcc_feat) // 4831 

Firstly, why is the result of mfcc length equal to 4831 and how to display this in the annotation that I have in seconds? The total duration of the video is 48 seconds. And video annotation 0 everywhere except windows 19-29sec, where there is 1. How can I find samples in window (19-29) from mfcc results?

+5
source share
1 answer

Run

  mfcc_feat.shape 

You should receive (4831.13). 13 is the length of your MFCC (by default, numcep is 13). 4831 are windows. The default winstep value is 10 ms, which corresponds to the length of the sound file. To get into windows corresponding to 19-29 seconds, just a cut

 mfcc_feat[1900:2900,:] 

Remember that you cannot listen to MFCC. It represents only an audio fragment of 0.025 s (default value is winlen ).

If you want to get to the sound itself, this

 sig[time_beg_in_sec*rate:time_end_in_sec*rate] 
+3
source

Source: https://habr.com/ru/post/1273684/


All Articles