How to use MFCC vectors to classify a single audio file?

This is probably a very stupid question, but I could not find details anywhere.

So, I have a sound recording (wav file), the length of which is 3 seconds. This is my sample and should be classified as [class_A] or [class_B].

Following some mutation in MFCC, I divided the sample into frames (more precisely, 291 frames), and I got MFCC from each frame.

Now I have 291 feature vectors, the length of each vector is 13.

My question is: how exactly do you use these vectors with a classifier (e.g. k-NN)? I have 291 vectors that represent 1 pattern. I know how to work with 1 vector for 1 sample, but I do not know what to do if I have 291 of them. I could not find an explanation.

+4
source share
1 answer

Each of your vectors will represent the spectral characteristics of your audio file as it changes over time. Depending on the length of your frames, you can group some of them (for example, by averaging over the size) to match the resolution with which you want the classifier to work. As an example, think about a specific sound that an envelope with an attack time of 2 ms can have: it can be as fine-grained as you want with your time slicing, so you can: a) group and average the number of MFCC vectors that represent 2 ms ; or b) recount the MFCC with the required time resolution.

If you really want the resolution to be beautiful, you can concatenate the 291 vectors and treat it like one vector (out of 291 x 13 dimensions), which will likely need a huge dataset for training.

+4
source

Source: https://habr.com/ru/post/1480743/


All Articles