Workflow: audio to caption, caption text editing, text to speech to generate a new audio track

Copper Contributor

I have to produce video clips for IT training, starting from video recordings in which I want to edit words and dub with a synthetic voice.

The workflow should then be:

  1. extracting caption files from the video,
  2. editing the text in the caption file, (e.g. to correct inaccurate words)
  3. applying a text to speech function on the caption file, (to have an homogeneous, standard voice on all clips)
  4. generate a new audio track,
  5. apply simple video editing functions (cut and paste clips, leveraging the standardized voice) to produce the final clip

Steps 1, 2, 5 are available into Stream, how could I implement and intgrate the remaining steps 3, 4 to have an integrated video cotnent management solution for training?

1 Reply
Microsoft acquired the company clipchamp who builds a web based video editor. They have a cool feature that does text to speech audio tracks.

clipchamp isn't part of M365 enterprise yet but we are working on rebuilding it into M365. For the time being you could use their current product. I think the text to speech is a paid feature.