Forum Discussion
Matt Cannady
May 03, 2018Copper Contributor
How to get custom vtt subtitle or caption file to show up in the transcript window and be searchable
Marc Mroz, We've been looking at using Stream for internal technical videos and were also disappointed with the auto-transcription. We now have a hand crafted transcription file to use with our video...
- May 04, 2018
Ok... I finally figured out what was going on after lots of trial/error and closer looking at the problematic VTT files.
It turns out Stream will not parse the VTT file uploaded in the caption section and turn it into the transcript if there is a line in the file that has no elapsed duration/gap between the time codes. For example like this:
00:16:24.490 --> 00:16:24.490 keep falling apart.
If you change the time codes so there is a time gap between them, it works fine. Like this:
00:16:24.490 --> 00:16:24.491 keep falling apart.
So for the example VTT file you sent, there is actually 2 lines in the file where there is no gap in the time codes. I updated these 4 lines in your file to ensure there is gap and the subsequent line also starts at the new time. Once I did that your custom VTT file works correctly and shows up as the transcript. (See the attached file below that is the fixed VTT, to try yourself.)
00:16:24.490 --> 00:16:24.491 keep falling apart. 00:16:24.491 --> 00:16:26.890 THE WITNESS: 417? 00:41:07.740 --> 00:41:07.741 read it. 00:41:07.741 --> 00:41:21.720 (Requested portion of record read.)
I'll file a bug on our side in Stream to fix this issue so that our parsing logic still works even if there are lines in the VTT with the beginning/ending time codes the same. I don't know how easy/hard this will be to fix on our side.
Matt Cannady
May 04, 2018Copper Contributor
Marc Mroz - I downloaded an autocaption vtt for a different video, then reuploaded that video with the autogenerated vtt as the english caption and the transciption box is present immediately.
This may be due to the tagged text in the MS auto vtt? Each time coded line of text in the autogenerated file has a line of text that appears to be an ID, possibly to track changes in your dictionary system? It may be something else though. Is this "pre-line" necessary in the custom caption uploads for transcription and search?
MS autogenerated caption line:
ad5b06c8-5bed-4975-b4e9-9683811d8b22
00:00:19.650 --> 00:00:24.380
start another way what we're really talking about today is reconstruction about present-day
Converted smi to vtt:
00:06:49.730 --> 00:06:52.070
of view closed, anyway, I figured I'd just go through
As a side note, the attempt I referenced earlier to upload the same problem video but with autogenerated transcription immediately gives me the option of opening the transcription box, but notifies me that the transcription is not ready yet, as it should.
Marc Mroz
Microsoft
May 04, 2018I'm still trying to narrow down what it's doing through testing. Right now I'm thinking it has something to do with the contents of the file, encoding, or something like that which is causing it not to pick it up and index it for the transcription window/search.
That guid you see in the autogenerated one is needed it seems, but I've succfully uploaded a VTT file without it, waited a bit, and then it worked. When I downloaded the file again our system added the guids above each time code. So I think we are supposed to be parsing the uploaded VTT and adding the time codes. It seem there is something specific about some VTT files that our system doesn't like, and thus it doesn't parse and add those guids.
I have a VTT file that it doesn't like. I'm trying to narrow down what specifically about the encoding or contents of the file our system doesn't like. If you want to send me your problematic VTT file I can include that and my sample when I go talk with a developer to see if they can narrow down where the problem is.
- Matt CannadyMay 04, 2018Copper Contributor
Marc Mroz - Sure. The video I'm testing with is a demo deposition file from our court reporter we hired to transcribe our presentations to overcome the fledgling auto-transcription issues. The demo came with the smi file, I used an online conversion site to get our vtt, I also included the auto generated file.
- Marc MrozMay 04, 2018
Microsoft
As a workaround for the time being, it seems if you let Stream generate the VTT file for you, download it, edit it by hand, and re-upload it, then it will work fine.
- Chris_Deignan59Sep 18, 2020Copper Contributor
I appreciate that Marc but I have found that the accuracy of the auto generated file leaves a lot to be desired when compared with autogenerated captions created in either Descript, Otter.ai or Transcriptive. Marc Mroz
- Matt CannadyMay 04, 2018Copper ContributorThat's a lot of editing.
- Marc MrozMay 04, 2018
Microsoft
Ok... I finally figured out what was going on after lots of trial/error and closer looking at the problematic VTT files.
It turns out Stream will not parse the VTT file uploaded in the caption section and turn it into the transcript if there is a line in the file that has no elapsed duration/gap between the time codes. For example like this:
00:16:24.490 --> 00:16:24.490 keep falling apart.
If you change the time codes so there is a time gap between them, it works fine. Like this:
00:16:24.490 --> 00:16:24.491 keep falling apart.
So for the example VTT file you sent, there is actually 2 lines in the file where there is no gap in the time codes. I updated these 4 lines in your file to ensure there is gap and the subsequent line also starts at the new time. Once I did that your custom VTT file works correctly and shows up as the transcript. (See the attached file below that is the fixed VTT, to try yourself.)
00:16:24.490 --> 00:16:24.491 keep falling apart. 00:16:24.491 --> 00:16:26.890 THE WITNESS: 417? 00:41:07.740 --> 00:41:07.741 read it. 00:41:07.741 --> 00:41:21.720 (Requested portion of record read.)
I'll file a bug on our side in Stream to fix this issue so that our parsing logic still works even if there are lines in the VTT with the beginning/ending time codes the same. I don't know how easy/hard this will be to fix on our side.