Workflow: audio to caption, caption text editing, text to speech to generate a new audio track

Copper Contributor

I have to produce video clips for IT training, starting from video recordings in which I want to edit words and dub with a synthetic voice.

The workflow should then be:

  1. extracting caption files from the video,
  2. editing the text in the caption file, (e.g. to correct inaccurate words)
  3. applying a text to speech function on the caption file, (to have an homogeneous, standard voice on all clips)
  4. generate a new audio track,
  5. apply simple video editing functions (cut and paste clips, leveraging the standardized voice) to produce the final clip

Steps 1, 2, 5 are available into Stream, how could I implement and intgrate the remaining steps 3, 4 to have an integrated video cotnent management solution for training?

1 Reply
Microsoft acquired the company clipchamp who builds a web based video editor. They have a cool feature that does text to speech audio tracks.

https://clipchamp.com/en/features/ai-voice-over-generator/

clipchamp isn't part of M365 enterprise yet but we are working on rebuilding it into M365. For the time being you could use their current product. I think the text to speech is a paid feature.