Forum Discussion
Editing transcripts: Removing extra lines of data from export
- Jan 03, 2020
Agentjh dhthompson mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/205468
dhthompson, I found a workaround. Download the script at Stream. Select all and copy and paste into Excel. Do a find and replace on "NOTE*" and replace with nothing (blank). Then do the same for "*-*". That should get rid of everything but the text. Then to get rid of the blank rows, do ctrl G to open the "Go to" popup. Click "Special". Select "Blanks". In the Home menu of Excel, go to the "Cells" section. Click the "Delete" drop down and select "delete sheet rows". Then I copied the text to Word and read through it. Still not great but a lot better than with all the data between the transcript text. Hope that helps.
Agentjh This is amazing! For anyone else who found it pasted in word from Excel into lots of pages where each line was a handful of words long...I found one more step was necessary because I ended up with 55 pages because of a zillion hard returns. I then needed to delete the hard returns in Word.
1. Navigate to Find and Replace
2. Click on the gear icon to go to advanced Find and Replace
3. In the Replace section click on Special, and select Paragraph Mark
4. Enter the paragraph mark symbol in in the Find What field "^p".
5. Replace with: enter a blank (hit the space bar).
This reduced the text to 14 pages, and now some fun editing ensues.