How does speech length affect the neural network in speaker recognition?

Question

How does speech length affect the neural network in speaker recognition?

I study neural networks and try to create a speaker recognition system using tensor flow. I wanted to know how speech length affects the neural network. For example, I have 1000 different sound recordings with the same length and 1000 different sound recordings with different lengths. So, how will a neural network theoretically work with these types of data? Will a neural network with a database of the same length work better or worse? Why?

+5

machine-learning neural-network tensorflow audio

Nikas Žalias Jan 03 '17 at 20:26

source share

2 answers

Lukasz Tracewski · Answer 1 · 2017-01-04T17:16:31+0000

It depends on the type of neural network. When developing this type, you usually indicate the number of input neurons; a su cannot supply it with data of arbitrary length. In the case of longer sequences, you need to either crop your data or use a sliding window.

However, some neural networks allow you to process an arbitrary sequence of inputs, for example, a recurrent neural network . The latter seems to be a very good candidate for your problem. Here 's a good article describing the implementation of a certain type of RNN called Long Short-Term Memory , which work great with speech recognition.

Dmytro prylipko · Answer 2 · 2017-01-16T16:41:13+0000

I assume that your question can be reformulated as: How can a neural network process audio of different lengths?

The trick is that a signal of an arbitrary size is converted into a sequence of feature vectors of a fixed size. See my answers here and here .

How does speech length affect the neural network in speaker recognition?

More articles: