Detecting pauses in an audio file of a spoken word using pymad, pcm, vad, etc.

Question

Detecting pauses in an audio file of a spoken word using pymad, pcm, vad, etc.

At first I am going to state in general what I am trying to do, and ask for advice. Then I will explain my current approach and ask for answers to my current problems.

Problem

I have a person talking MP3 file. I would like to break it into segments that correspond to a phrase or phrase. (I would do it manually, but we are talking about a data clock.)

If you have tips on how to do this programmatically or for some existing utilities, I'd love to hear it. (I know about voice activity detection, and I studied it a bit, but I did not see any utilities available.)

Current approach

I thought that the easiest way would be to scan MP3s at regular intervals and identify places where the average volume was below a certain threshold. Then I would use some existing utility to cut mp3 in these places.

I played with pymad and I believe that I have successfully extracted PCM (Pulse Code Modulation) data for each mp3 frame. Now I'm stuck because I can't seem to turn around as the PCM data goes to relative volume. I also know about other complicating factors, such as multiple channels, the big endian versus the small one, etc.

Advice on how to match a group of pcm samples to a relative volume will be key.

Thank!

+3

pcm mp3

james Apr 13 '10 at 0:51

source share

3 answers

Treb · Answer 1 · 2010-06-24T20:02:17+0000

PCM - . . ( : , .) , PCM 8- . > 0, , < 0 . ( , ), .

: . , . Audacity Silence Finder, silence level . Minimum silence duration, , , ( , ).

, :

. 1/10, 1/20 1/100 .
(silence level Audacity). - , , (, ..). , .
. . ( = * ). Minimum silence duration, , .

Audacity - , , . , , , , (.. 0). , , . , ...

james · Answer 2 · 2010-04-13T03:39:24+0000

. Audacity Silence Finder. , . .

Whirlwind · Answer 3 · 2010-04-13T00:57:31+0000

PCM - . , (1, ) , 0 . , 1 0.

To estimate the amplitude, draw a sin curve, then normalize it along the x axis. Then you should be able to estimate the amplitude of the sin wave at different points. Once you do this, you will be able to choose places where the amplitude is lower.

You can also try to use the Fourier transform to estimate where the signals are most different.

Detecting pauses in an audio file of a spoken word using pymad, pcm, vad, etc.

Problem

Current approach

More articles: