Submitted by Ok-Air4027 t3_xuog93 in MachineLearning
I am working on a speech to text project and I want to get different voices recognised to know which person said what and note it down as a conversation to text with names of speakers . I did not found any parameter to actually distinguish human voices mathematically . Is there a way to do so . There can be any number of people in conversation .
VectorSpaceModel t1_iqwjnnd wrote
Google open sourced this
https://arxiv.org/abs/1810.04719
https://ai.googleblog.com/2018/11/accurate-online-speaker-diarization.html?m=1