suflaj t1_ivumz3h wrote
Reply to comment by Snickersman6 in would it be possible to train something that processes a video and outputs a text script like the following? Teacher: That is the topic we will be covering today. Student 1: What about the part of the lesson we didnt go over yesterday. by [deleted]
It has not been marketed as such because it's built on top of ASR. Hence, you search for ASR and then look for its features. The same way you look for object detection, and if you need segmentation, you look if it has a detector that does segmentation. A layman looking for a solution does not search for specific terms and marketers know this.
Be as it be, the answer remains the same - Google offers the most advanced and performant solution, it markets it as ASR or how they call it text to speech, with this so called diarization being one feature of it.
Viewing a single comment thread. View all comments