Turn text to subs/video/audio in Edge


In the era of information explosion, expression in text are shelved! People tend to watch videos instead of reading books. Why not turn various articles and classical books into videos for our children to watch with the assistance of Edge? It also helps those columnists to influence public opinion and lobby the government for their policy by attracting more audience "watching" their articles! Sometimes people are lazy and just want to watch a long article on website or sometimes the article are written in a totally alien foreign language. Turn text to video (subs indeed) automatically will help in this situation! And Edge is the best one to do this job! Why?

  1. Edge could apply the world best Azure TTS;
  2. Edge have one of world best auto translator;
  3. Edge is browser and it helps when we want to turn an article into video after we publish it;
  4. Edge can customize the format and font of MS office;
  5. Export video will benefit the MS video and Bing search engine to generate more traffic income to rival YouTube and Google.


How to realize the Text to video function on Edge? It takes 3 steps:

STEP ONE: Text editor in Edge

STEP TWO: Import translation layer and phonetic layer

STEP THREE: Edit each sound tracts

STEP FOUR: Setting the format of each sub tracts text box and the background color

STEP FIVE: Export into the following 4 format to any links


STEP ONE: Text editor in Edge:

Turn a webpage into Text like the TTS immerse reader did. And make it possible to edit the content of the text.

Once you finish your editing. The system will sentence the article with punctuations! For example:


This has been a devastating year.

More than 1.6 million people have died in the COVID-19 pandemic,

with more than 75 million cases and tens of trillions of dollars in economic damages.

Millions of people are out of work and struggling to pay their bills,

and more than a billion children are missing out on crucial time in school.

In the U.S.,

this year also saw the horrifying killings of George Floyd and Breonna Taylor,

ruinous wildfires,

and a presidential election unlike any other in modern times.

In this process an article turn into varies of sublines! There might be some rule in it like how many characters a line of sub would have. 13 or something? Like in Chinese, no punctuation will be allowed in a sub. Some sub line will have context relations because they are actually one sentence cut into different subline by ","; while some sub lines are not, for they are not in the same sentence.


STEP TWO: Import translation layer and phonetic layer

The translation layer might include:

①The translation of the complete sentence or maybe a clause(each subline) being read now;

②The translation of the word or phrases being read now.

There will be a pre-step of dividing words for a writing system without space between words like Chinese and Japanese

The phonetic layer is the phonemic transcription for the subline being read now. Some language have standard phonemic transcription for instance: Kenyon and Knott for American English; Pinyin for Mandarin Chinese; and Rōmaji for Japanese.


STEP THREE: Edit each sound tracts:

1.Divide your sublines into different characters. In some scripts for drama and interviews, the name of each actor or speaker will appear in front of a natural paragraph. Text to video function will figure out a way to collect all the text that appears after each person's name at the beginning of a paragraph into a specific character.

2.Set up each character's sound tract with the following:

①Choosing a specific reader of Azure TTS;

②Setting the tone of speech: Sad, excited, or happy.

③Setting up the speed of speech;(0.2~5 times 0.1 times per degree)

④Setting up the pitch of the speech without changing the speed.

⑤Volume of speech

Sometimes the sound tracts are not in the language of the original text. It apply to the translation layer! I call it dubbing tracts. And a dubbing tracts could also have different layers.

3.Set up the non-speech sound tracts like background music/video tracts with the following:

Volume; speed; pitch.


STEP FOUR: Setting the format of each sub tracts text box and the background color

How many sub tracts a video might have? At least 7!

1.Subline being read now;

2.Translation of the subline being read now;

3.Translation of the word or phrases being read now;

4.Phonetic transcription being read now;

5.Sublines that have been read;

6.Sublines that is going to be read;

7.Title and subtitle.

Different mode employ different layers. It could be customized by users in advance so that it could be adopted immediately to convert text to video. I made some models (take English speaker and Chinese speaker for example and it could also be adopted into any other two languages):

For English speaker:

1.General mode(reading an English article)


2.General mode when reading a article in foreign language(dubbed in English; take Chinese as example)



3.Language learning mode(dubbed in the target language; take Chinese as example)



For a Chinese speaker:

1.General mode:(reading an article in Chinese)


GatesLover_7-1609496380471.png2.General mode when reading an article in foreign language (dubbed in Chinese; take English as example):


3.Language learning mode(dubbed in the target language; take English as example)


4.Children mode (with phonetic assistance)



5.Chinese classic mode (Reading Chinese classics)



This vertical mode could be adopted in Edge cellphone APP like Tik Tok.

Chinese, Japanese, Korean scripts write from up to down then right to left;

Mongolian scripts write from up to down then left to right;

Uyghur scripts write from right to left then up to down.

This should also be taken into consideration in the text to subs.


Each sub tract is actually a text box. So how many format of the 7 sub tracts above could the users customize:










4.position of each sub tracts

5.Animations of each sub tracts:


6.Sound effect of each sub tracts:


And the background should be set up either like a pure color or a picture or a video.


STEP FIVE: Export into the following 4 format to any links

The text will finally being exported into 4 different format:

1.Fixed CC sub in the picture with the background filling color and sound tracts:

First it is a CC text sub with text and timeline that could be edited after being published;

Then additional to the text and timeline, it contains all the format above which a normal CC sub don't have;

Thirdly, there are more than one sub tracts and each sub tract have its own position and writing direction;

Finally, it's a fixed sub with the picture which means the sub always follow the size and position with the video. And the sub will remain fixed no matter the mouse over, click or drag any of the sub tracts which could make the dictionary extension accessible when we click the sub.

2.Full screen CC sub with sound tracts:

The full screen CC sub have the ① to ③ features above; but could be drag and shut down.

It could also be used in backstage when we work on other tab or the screen locked 


Someone like to export into a video;


Someone who don't like video could export into audio


0 Replies