Determining the matrix norm for comparing two MFCC matrices

I would like to give a clear idea of ​​the problem that I am facing.

Scenario:

I have an MFCC generator block that receives speech samples from the user and generates a rectangular matrix, say A of order mxn, whose elements are Cesptral Coefficients (MFCC). Now suppose I maintain a database that was previously saved containing the user's voice. Through the LPC filter, I generate a speech sample, and then direct it to the MFCC generator block with the restriction that I do not give all samples from the database for the filter to generate a speech signal. Rather, I give part of a speech example. Now this predicted speech signal will now be routed to the MFCC generator block to generate the predicted Cepstral coefficient, which again turns out to be another rectangular matrix, for example, B of the same order mx n. Then I use the matrix norm along with a heuristically selected threshold to compare two matrices (find an error) and authenticate the user. If it fails, the prediction input speech samples are linearly increased and constraints are checked again.

Understanding the matrix A, B, as defined previously.

Matrix rows represent the number of coefficients that should be generated for the speech frame. Columns represent the concatenation of the coefficients of all frames for the entire speech sample. A and B have the same settings. (During the generation of MFCC, we use a fixed-size window, work with samples under the window, which gives the coefficients for MFCC for this frame, and then shifts the window so that the slide steps are smaller than the window size, that is: each successive window overlaps).

Question:

I saw this link to two series of Mfcc odds . I found this somewhat useful. However, I have several problems related to the problem I just identified. Even when the authenticated user speaks (pronounces the exact word that is stored in the database), it is not necessary that the MFCC (positions of each element in the matrix) exactly match the one generated during the prediction. If both a rectangular matrix is ​​converted to a vector, there may be a time delay between samples. If so, the rules indicated in the link should not work even for an authenticated user. How to fix it? Are there other forms of solving this problem?

Thanks.

+4
source share

Source: https://habr.com/ru/post/1385359/


All Articles