Google Speech API + Go - transcribing an audio stream of unknown length

I have a rtmp video call stream and I want to decrypt it. I created 2 services in Go and I get the results, but this is not very accurate, and many data seems to be lost.

Let me explain.

I have a service transcode, I use ffmpeg to transcode the video to Linear16 audio and put the output bytes in the PubSub queue for the service transcribeto process. Obviously, there is a limit on the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I transcoded the transcoded data into 3-second clips (not a fixed length, it just seems to the right) and put them in the queue.

Data is transcoded quite simply:

var stdout Buffer

cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")
cmd.Stdout = &stdout

if err := cmd.Start(); err != nil {
    log.Fatal(err)
}

ticker := time.NewTicker(3 * time.Second)

for {
    select {
    case <-ticker.C:
        bytesConverted := stdout.Len()
        log.Infof("Converted %d bytes", bytesConverted)

        // Send the data we converted, even if there are no bytes.
        topic.Publish(ctx, &pubsub.Message{
            Data: stdout.Bytes(),
        })

        stdout.Reset()
    }
}

transcribe 1 3 , , . Speech 60 , 30 , , , .

:

stream := prepareNewStream()
clipLengthTicker := time.NewTicker(30 * time.Second)
chunkLengthTicker := time.NewTicker(3 * time.Second)

cctx, cancel := context.WithCancel(context.TODO())
err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {

    select {
    case <-clipLengthTicker.C:
        log.Infof("Clip length reached.")
        log.Infof("Closing stream and starting over")

        err := stream.CloseSend()
        if err != nil {
            log.Fatalf("Could not close stream: %v", err)
        }

        go getResult(stream)
        stream = prepareNewStream()

    case <-chunkLengthTicker.C:
        log.Infof("Chunk length reached.")

        bytesConverted := len(msg.Data)

        log.Infof("Received %d bytes\n", bytesConverted)

        if bytesConverted > 0 {
            if err := stream.Send(&speechpb.StreamingRecognizeRequest{
                StreamingRequest: &speechpb.StreamingRecognizeRequest_AudioContent{
                    AudioContent: transcodedChunk.Data,
                },
            }); err != nil {
                resp, _ := stream.Recv()
                log.Errorf("Could not send audio: %v", resp.GetError())
            }
        }

        msg.Ack()
    }
})

, , 3- , , Speech API - , , , , , . , . - . , .

:

1) ( ..)?

2) ?

(. . , - , .)

+4
1

, , .

. topic.Publish stdout.Reset() , ffmpeg, , stdout, reset.

, . . PubSub , , .

? . sub pub.

, . :

  • ( 60 )
  • (, 5 )
+1

Source: https://habr.com/ru/post/1693270/


All Articles