I am trying to learn the implementation of HMM GMM and created a simple model for detecting certain specific sounds (animal calls, etc.).
I am trying to prepare a network of HMM (hidden Markov model) with GMM (Gaussian mixtures) in MATLAB.
I have a few questions, I could not find information about them.
1) Should the mhmm_em() function call in a loop for each HMM state, or is it executed automatically?
For instance:
for each state Initialize GMM's and get parameters (use mixgauss_init.m) end Train HMM with EM (use mhmm_em.m)
2)
[LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ... mhmm_em(MFCCs, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', M);
Last parameter, if it is the number of Gaussian or state_number-1?
3) If we seek maximum likelihood, then where does Viterbi come into play?
Tell me, do I want to detect a certain type of animal / human call after training my model with the help of the acoustic signs-vectors highlighted by me, should I still need the Viterbi algorithm in test mode?
This is a bit confusing to me, and I would really appreciate an explanation of this part.
Any comments for the code in terms of GMM HMM logic will also be appreciated.
thanks
Here is my MATLAB routine;
O = 21; % Number of coefficients in a vector(coefficient) M = 10; % Number of Gaussian mixtures Q = 3; % Number of states (left to right) % MFCC Parameters Tw = 128; % analysis frame duration (ms) Ts = 64; % analysis frame shift (ms) alpha = 0.95; % preemphasis coefficient R = [ 1 1000 ]; % frequency range to consider f_bank = 20; % number of filterbank channels C = 21; % number of cepstral coefficients L = 22; % cepstral sine lifter parameter(?) %Training [speech, fs, nbits ] = wavread('Train.wav'); [MFCCs, FBEs, frames ] = mfcc( speech, fs, Tw, Ts, alpha, hamming, R, f_bank, C, L ); cov_type = 'full'; %the covariance type that is chosen as ҦullҠfor gaussians. prior0 = normalise(rand(Q,1)); transmat0 = mk_stochastic(rand(Q,Q)); [mu0, Sigma0] = mixgauss_init(Q*M, dat, cov_type, 'kmeans'); mu0 = reshape(mu0, [OQM]); Sigma0 = reshape(Sigma0, [OOQM]); mixmat0 = mk_stochastic(rand(Q,M)); [LL, prior1, transmat1, mu1, Sigma1, mixmat1] = ... mhmm_em(MFCCs, prior0, transmat0, mu0, Sigma0, mixmat0, 'max_iter', M); %Testing for i = 1:length(filelist) fprintf('Processing %s\n', filelist(i).name); [speech_tst, fs, nbits ] = wavread(filelist(i).name); [MFCCs, FBEs, frames ] = ... mfcc( speech_tst, fs, Tw, Ts, alpha, hamming, R, f_bank, C, L); loglik(i) = mhmm_logprob( MFCCs,prior1, transmat1, mu1, Sigma1, mixmat1); end; [Winner, Winner_idx] = max(loglik);