I have a database that contains streaming video. I want to calculate LBP functions from images and MFCC sound, and there is an annotation for each frame in the video. Annotations are embedded in video clips and video time. So I want to map the time that I have from the annotation to the mfcc result. I know sample_rate = 44100
from python_speech_features import mfcc from python_speech_features import logfbank import scipy.io.wavfile as wav audio_file = "sample.wav" (rate,sig) = wav.read(audio_file) mfcc_feat = mfcc(sig,rate) print len(sig) //2130912 print len(mfcc_feat) // 4831
Firstly, why is the result of mfcc length equal to 4831 and how to display this in the annotation that I have in seconds? The total duration of the video is 48 seconds. And video annotation 0 everywhere except windows 19-29sec, where there is 1. How can I find samples in window (19-29) from mfcc results?
source share