How to separate audio file based on different speakers

Question

How to separate audio file based on different speakers

I have a bunch of audio files about a phone conversation. I want to try splitting an audio file into two, each of which contains only one speaker’s speech. Maybe I need to use speech diarization. But how can I do this? can anyone give me some clues? Thanks. ps: Linux OS.C / C ++

+4

c ++ c linux audio speech

Bo liu Oct 18 '12 at 18:40

source share

2 answers

Yes, diarrhea is what you want.

There are a few tools you might want to look at, both are GPLs. One of them is LIUM spkdiarization (Java), the other is the SHoUT toolkit (C ++). LIUM is well documented and there is a script next to it, SHOUT is a bit more cryptic, so you should follow the author’s instructions posted here .

Although I may be too late .;)

+3

hruske Jun 09 '13 at 9:18

source share

Kelly christoffersen · Accepted Answer · 2012-10-18T21:42:07+0000

While the separation of individual speakers is a rather complex problem, you can automatically split the sound where there are pauses. This will create a series of files that are likely to be easier to manage, as the speakers often alternate between pauses.

This approach requires the Julius open source speech recognition decoding package. This is available in many Linux package repositories. I am using the multiverse Ubuntu repository.

Here is the site: http://julius.sourceforge.jp/en_index.php

Step 0: Install Julius

sudo apt-get install julius

Step 1: Segment Sound

 adintool -in file -out file -filename myRecording.wav -startid 0 -freq 44100 -lv 2048 -zc 30 -headmargin 600 -tailmargin 600

-startid is the number of the initial segment to be added to the file name
-freq - sample rate of the original audio file
-lv - sound level over which voice detection will be active
-zc - these are zero crossings over which speech detection will be actively
-headmargin and -tailmargin is the amount of silence before and after each audio segment

Note that -lv and -zc should be adjusted for your specific sound recording attributes, while -headmargin and -tailmargin should be adjusted for your specific speaker styles. But the above values have worked well in my voice recordings in the past.

Here is the documentation: http://julius.sourceforge.jp/juliusbook/en/adintool.html

In my experience, pre-processing sound using compression and normalization gives better results and requires less adjustment of Julius arguments. These initial steps are recommended but not required.

This approach requires the open source SOX toolkit. It is also available in many Linux package repositories. I am using the Ubuntu universe repository.

Here is the site: http://sox.sourceforge.net

Step -2: Install SoX

 sudo apt-get install sox

Step -1: Preprocessor Sound

 sox myOriginalRecording.wav myRecording.wav gain -b -n -8 compand 0.2,0.6 4:-48,-32,-24 0 -64 0.2 gain -b -n -2

gain -b -n balances and normalizes the sound to a given level
compand compresses (in this case) the sound based on the parameters

Please note that it may take some time to fully understand the options for compand. But the above values have worked well in my voice recordings in the past.

Here is the documentation: http://sox.sourceforge.net/sox.html

Although this will not give you an identification of each speaker, it will greatly simplify the task of doing this by ear, which may be the only option for a while. But I hope you find a practical solution if it is already available.

How to separate audio file based on different speakers

More articles: