Submitted by akshaysri0001 t3_10w79eo in MachineLearning
Hey everyone, I want to make a personal voice assistant who sounds exactly like a real person. I tried some TTS like tortoise TTS and coqui TTS, it done a good job but it takes too long time to perform. So is there any other good realistic sounding TTS which I can use with my own voice cloning training dataset? Also I'm a bit amazed by the TTS used by eleven labs, so can someone explain how can I achieve that level of real-time efficiency in a voice assistant?
marcus_hk t1_j7lqpav wrote
I haven't been keeping up with TTS since Tacotron 2, but it seems Eleven Labs works fundamentally the same way.
As for real-time performance you may need to port your Python code to C++.