Forum Discussion
Editing transcripts: Removing extra lines of data from export
- Jan 03, 2020
Agentjh dhthompson mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/205468
dhthompson, I found a workaround. Download the script at Stream. Select all and copy and paste into Excel. Do a find and replace on "NOTE*" and replace with nothing (blank). Then do the same for "*-*". That should get rid of everything but the text. Then to get rid of the blank rows, do ctrl G to open the "Go to" popup. Click "Special". Select "Blanks". In the Home menu of Excel, go to the "Cells" section. Click the "Delete" drop down and select "delete sheet rows". Then I copied the text to Word and read through it. Still not great but a lot better than with all the data between the transcript text. Hope that helps.
- JaneramaJun 13, 2022Copper Contributor
Agentjh This is amazing! For anyone else who found it pasted in word from Excel into lots of pages where each line was a handful of words long...I found one more step was necessary because I ended up with 55 pages because of a zillion hard returns. I then needed to delete the hard returns in Word.
1. Navigate to Find and Replace
2. Click on the gear icon to go to advanced Find and Replace
3. In the Replace section click on Special, and select Paragraph Mark
4. Enter the paragraph mark symbol in in the Find What field "^p".
5. Replace with: enter a blank (hit the space bar).
This reduced the text to 14 pages, and now some fun editing ensues.
- Marc MrozJan 03, 2020
Microsoft
Agentjh dhthompson mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/205468
- MightyMedhaDec 08, 2023Copper Contributor
Marc Mroz Thanks!! It's a wonderful tool, works nicely but one problem, My transcript has numbering also with timeline. how to remove that numbers also maintain line break.
example:
1Hello! 2So, we will here be looking after 3how we need to log in into the supplier account. 4That is, how as a supplier we can log in into our account. 5So, first we need to open a browser. 6Then, after opening the browser,
Wants Output like:
Hello!
So, we will here be looking after
how we need to log in into the supplier account.
- Clare888Dec 08, 2023Copper Contributor
MS Word has the option to find and replace for any digit: Edit>>Find>>Replace>>use the pull down menu from the Find input box and you'll find "Any Digit" you can then replace them.
- PaulettePMay 12, 2023Copper ContributorThis worked perfectly! Thank you for creating and sharing!