Word-by-word time stamps for transcriptions?

I'd like the automatic transcription/captioning feature to produce a VTT file that has a time stamp for each word, as opposed to ranges of time that contain multiple words. Does Microsoft Stream have this functionality, or is it planning to implement the function eventually like google ASR?


