How to prepare a data set for speech recognition

I need to train a bi-directional LSTM model for discrete speech recognition (individual numbers from 0 to 9). I recorded a speech from 100 speakers. What should I do next? (Suppose I split them into separate .wav files containing one number per file). I will use mfcc as functions for the network.

In addition, I would like to know the difference in the data set if I am going to use a library that supports CTC (temporary classification Connectionist)

+5
source share
1 answer

You can use the provided answer / guide here

, LSTM (pybrain, theano, keras), .

Theano (Binary LSTM ) Keras (Tutorial), .

, .

+4

Source: https://habr.com/ru/post/1621796/


All Articles