Jul 28 2019 01:03 AM - edited Jul 28 2019 01:23 AM
Been testing the transcription in Stream. For a multinational organization it is not really ready for prime time (too many accents to do well in English, and no direct support or languages other than Spanish). However, I do find potential in being able to export the auto-generated transcripts by senior leadership for purely text-related usage. That said, even if one is to tidy up the existing transcript there are many many rows of extra data between each text. Not just timecode but like this:
NOTE Confidence: 0.936690330505371
9eed9142-c299-42ed-96f1-fed2c6617e0c
00:00:21.476 --> 00:00:24.633
unchanged to make homes
cleaner and healthier.
NOTE Confidence: 0.909458994865417
e2af81c5-7559-4a57-8bf7-5f7b2c586c4e
00:00:27.370 --> 00:00:31.400
Delicate wool garments have
always been tricky to care for
The captions are not even on one editable row, and there are three lines (and blank rows not shown here) to be removed between each text. Over an hour meeting, or townhall presentation, this is a LOT of editing.
Has anyone come across a way to export and automate removal of the extra material in order to create a clean text document - a pure transcript and not a caption file?
Oct 18 2019 03:16 PM
@dhthompson I've been searching for an easy way to do this as well!
Dec 11 2019 02:23 PM
@dhthompson, I found a workaround. Download the script at Stream. Select all and copy and paste into Excel. Do a find and replace on "NOTE*" and replace with nothing (blank). Then do the same for "*-*". That should get rid of everything but the text. Then to get rid of the blank rows, do ctrl G to open the "Go to" popup. Click "Special". Select "Blanks". In the Home menu of Excel, go to the "Cells" section. Click the "Delete" drop down and select "delete sheet rows". Then I copied the text to Word and read through it. Still not great but a lot better than with all the data between the transcript text. Hope that helps.
Jan 03 2020 01:04 PM
Solution@Agentjh @dhthompson @mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/20546...
Jan 31 2020 12:26 AM
Hi Marc.
I am new at Stream and I have a problem with the captions I have uploaded to my videoes via a .vtt file.
When I change some of my captions due to wrong devision of the sentences, fx. deleting a line I do not need anylonger, I can not delete the timeline so it would not show anymore. Please see the attached image, I hope it will show my problem.
Is there a way to delete this lines? When I try to click "Remove", nothing happens.
Thanks in advance :)
May 03 2020 08:05 AM
Thank you so much! You should make it more obvious on Google, this is super relevant tool and So useful. Thanks! @Marc Mroz
May 04 2020 10:19 PM
Hi @MarcMroz - any tips on helping this to work? I get the option to browse for a file to upload....then nothing!
May 21 2020 03:25 PM
@Marc Mroz WOW! I just tried your Clean-Up utility and it's amazing! Thank you so much for creating it. I tested it with a meeting that was just over an hour and the transcript as downloaded from Stream was 130 pages. Your utility removed all the unnecessary metadata and blanks in a matter of about 2 seconds!
Thank you for sharing it!
Jun 06 2020 12:36 AM
@Marc Mroz Thanks. I hadn't been in here for ages but our org is finally making the move to Stream and this came up front of mind again. Will check it out.
Jun 06 2020 12:49 AM
@Marc Mroz getting a 404 on your link https://aka.ms/StreamVTTCleaner
Jun 08 2020 06:19 AM
Your utility for cleaning up transcripts appears to have been removed. It is giving a 404 - Page not found error. Can you please have someone fix this issue?
Thank you.
Jun 11 2020 01:55 AM
@Marc Mroz I used this excellent utility last week for some research work I am doing. It cleaned up my transcription files brilliantly. I've gone to use it today for my final transcription file and am getting a 404, like others have reported. I'd love to see this utility back as soon as possible.
Jun 17 2020 07:08 PM
Thank you so much! That utility really made my job easier today. @Marc Mroz
Jul 24 2020 02:59 AM
@Marc Mroz Hi, this is a really fantastic tool you have developed, and I'm hoping to use it to clean up some transcripts I need to analyse on a project I am working on. For some reason though, the tool will only allow me to upload the transcript and won't produce any output - I wondered whether you might be able to help?
Nov 04 2020 01:34 AM
I followed the instructions from @Agentjh and put together the following macro. All I do now is open the downloaded vtt file in notepad, select all, paste into excel, then run this Macro. It works for me and I hope helps other. This is the first macro I have ever created so please be gently if it's not particularly elegant!
Sub TranscriptCleaner()
'
' TranscriptCleaner Macro
' A macro to clean up vtt printed files from Microsft Stream.
'
'
Cells.Select
Range("E2").Activate
Selection.Replace What:="*-*", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
Selection.Replace What:="WEBVTT", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
Selection.Replace What:="NOTE*", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
Selection.Replace What:="00*", Replacement:="", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False, FormulaVersion:=xlReplaceFormula2
Selection.SpecialCells(xlCellTypeBlanks).Select
Selection.EntireRow.Delete
End Sub
Nov 04 2020 01:36 AM
This is great (as are some of the other web based solutions that have since appeared) but when you are working in a secure environment it just isn't acceptable to send your data out to some unknown service somewhere. My clients are happy with me uploading their data to a trusted organisation like Microsoft, but not some random website somewhere that I've found through Google.
Mar 18 2021 02:52 AM
Hi @Marc Mroz
Thank you so much for creating the VTT cleaner web utility, I was just wondering if it was GDPR compliant?
Many thanks
Jan 30 2022 11:30 AM
Feb 21 2022 08:03 AM
@KForster - Sorry for my super later reply. You can take a look at the JavaScript code on the page, everything is just done directly in the browser locally in JavaScript. It has NO connection back to Microsoft or any server. Thus, none of your text from the VTT leaves the browser.
It just reads the VTT file locally, does find and replace on a few strings and then sticks the cleaned output back to the screen and to the clipboard.
So it should be safe for you to use because it doesn't save anything at all.
Apr 27 2022 12:11 PM
Jan 03 2020 01:04 PM
Solution@Agentjh @dhthompson @mdlau - I just created a short web utility to clean up the Stream transcript VTT files for when you just want to get the text from the file without the metadata, time codes, and blank lines.
I linked the utility from the bottom of this help doc page: https://aka.ms/StreamVTTCleaner
Give it a try and see if this is useful for you.
The web utility I created is just a quick workaround, ideally this would be built into Stream itself directly. You should add your comments and votes to this idea in our ideas forum: https://techcommunity.microsoft.com/t5/microsoft-stream-ideas/allow-export-of-transcript/idi-p/20546...