Voice segmentation

I help the farm to cluster the rooster in groups according to their cry, so that cocks with a similar cry will live together. The farmer said he wants to know if the chickens will recognize any behavior from others, if so, when he gets the chicken, he will send him to a group of good chickens and hopes that this will bring some good influence on the new chicken. My job is to record the loud similarities of each group, and after a few weeks compare the results and see any growing similarities in the groups.

My idea is to write a program that gives a similarity score for two input wav files, so each rooster can find its closest roommate and get paired groups, and then group similar groups, finally in several groups.

I have several screams for 3 roosters and are analyzed using spectrograms (each rooster shouted twice):

cock A:

first crowing from cock Asecond crowing from cock A

cock B:

first crowing from cock Bsecond crowing from cock B

cock C:

first crowing from cock Csecond crowing from cock C

Before calculating the similarity, I would like to divide the scream into segments, so that each segment stores a certain frequency (which will be used to calculate the similarity later). My current solution:

Step 1: when the intensity line is broken, the sound will be separated by spaces; Step 2: when a critical change in frequency occurs, this time will be considered as the boundary of the segment

I think the steps described above are sufficient or not. I hope someone else has a better suggestion and how I can improve segmentation. Are there any methods or algorithms for my situation? Thanks!

+4
source share
2 answers

The best approach is to use some speech recognition methods. I used this for a project to recognize bird songs. In my case, I used HTK (Hidden Markov toolkit) to create HMMs that could recognize birds singing. You can zoom Mel to be more like your case. The Mel scale (from the MFCC) is associated with the human voice. If you search on Google, there are a few bird related articles that scale to Mel or Bark (PLP) to fit the animal's voice path.

You will need many samples for reliable HMM train parameters and analyze how many states are better. I suggest having at least 100 samples for each of these three songs and using 3-emitting HMM states. I can help you set up the HMM system. Please contact me.

Louis Ubel Labs ASR www.asrlabs.com.br

+2
source

Last year, we had several projects related to voice. It might be a bit like. What I remember using machine learning tools and libraries was very useful, for example. weka, quickminer, encog. It is worth checking the examples using cross validation. Parameters that could be checked: MFCC, YIN. I think that all related topics of voice speech may be useful for you :)

+1
source

Source: https://habr.com/ru/post/1381768/


All Articles