You can do this for audio or video, although I have to say that I do not think it will be particularly effective. Neural networks are great at pattern recognition. What model would the network itself configure? What is the sound of “happiness” or “anger”?
source
share