Forum Discussion
Editing transcripts: Removing extra lines of data from export
- Jan 03, 2020
Agentjh dhthompson mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/205468
dhthompson, I found a workaround. Download the script at Stream. Select all and copy and paste into Excel. Do a find and replace on "NOTE*" and replace with nothing (blank). Then do the same for "*-*". That should get rid of everything but the text. Then to get rid of the blank rows, do ctrl G to open the "Go to" popup. Click "Special". Select "Blanks". In the Home menu of Excel, go to the "Cells" section. Click the "Delete" drop down and select "delete sheet rows". Then I copied the text to Word and read through it. Still not great but a lot better than with all the data between the transcript text. Hope that helps.
Agentjh dhthompson mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/205468
- MightyMedhaDec 08, 2023Copper Contributor
Marc Mroz Thanks!! It's a wonderful tool, works nicely but one problem, My transcript has numbering also with timeline. how to remove that numbers also maintain line break.
example:
1Hello! 2So, we will here be looking after 3how we need to log in into the supplier account. 4That is, how as a supplier we can log in into our account. 5So, first we need to open a browser. 6Then, after opening the browser,
Wants Output like:
Hello!
So, we will here be looking after
how we need to log in into the supplier account.
- Clare888Dec 08, 2023Copper Contributor
MS Word has the option to find and replace for any digit: Edit>>Find>>Replace>>use the pull down menu from the Find input box and you'll find "Any Digit" you can then replace them.
- PaulettePMay 12, 2023Copper ContributorThis worked perfectly! Thank you for creating and sharing!
- Linda_WallersOct 20, 2022Copper Contributor
Marc Mroz, I just tried to register on the page you suggested so that I could comment on yourWEBVTT to text app. I was told I don't have the privileges. I am retired so I don't have a work email anymore. I guess that invalidates me. Anyway, I have been removing extra spaces and time stamps manually - a slog. your app is a wonderful addition to my tool kit. Thank you so much for your work! Now, all I need is a transcript that includes the names of the speakers. But, one problem at a time! Thanks again!
- christianharderApr 27, 2022Copper ContributorThis is the best solution I've found to get a clean extract from a .vtt file. Not perfect because it leaves you with a single massive block of text but at least all the time stamps are gone.
- SGarriottJan 30, 2022Copper ContributorJust started using Stream for converting video meetings to transcripts. You certainly saved my bacon (and my time!). Thanks so much.
- KForsterMar 18, 2021Copper Contributor
Hi Marc Mroz
Thank you so much for creating the VTT cleaner web utility, I was just wondering if it was GDPR compliant?
Many thanks- Marc MrozFeb 21, 2022
Microsoft
KForster - Sorry for my super later reply. You can take a look at the JavaScript code on the page, everything is just done directly in the browser locally in JavaScript. It has NO connection back to Microsoft or any server. Thus, none of your text from the VTT leaves the browser.
It just reads the VTT file locally, does find and replace on a few strings and then sticks the cleaned output back to the screen and to the clipboard.
So it should be safe for you to use because it doesn't save anything at all.
- DavidJacksonTKCNov 04, 2020Copper Contributor
This is great (as are some of the other web based solutions that have since appeared) but when you are working in a secure environment it just isn't acceptable to send your data out to some unknown service somewhere. My clients are happy with me uploading their data to a trusted organisation like Microsoft, but not some random website somewhere that I've found through Google.
- hsmith7Jul 24, 2020Copper Contributor
Marc Mroz Hi, this is a really fantastic tool you have developed, and I'm hoping to use it to clean up some transcripts I need to analyse on a project I am working on. For some reason though, the tool will only allow me to upload the transcript and won't produce any output - I wondered whether you might be able to help?