pronunciaai
pronunciaai t1_j6mqq5n wrote
Reply to [P] Fine Tuning Whisper in another language by ruizard
Huggingface just finished a sprint where they fine-tuned whisper on 100s of languages, going to their discord and following the guides is going to be by far the easiest way.
Check under ML-4-AUDIO channels "sprint-announcement", "discussions", and "whisper-model-playground"
pronunciaai t1_j6l49ij wrote
Reply to comment by blackkettle in [D] What's stopping you from working on speech and voice? by jiamengial
Yeah I work in the space (mispronunciation detection) and there is not a lack of frameworks, (speechbrain, NeMo, and thunder-speech being the more useful ones for custom stuff imo). The barrier to entry is all the stuff you have to learn to do audio ML, and all the pain points around stuff like CTC. Tutorials are more needed than frameworks to get more people actively working on speech and voice in my opinion.
pronunciaai t1_j6l3unb wrote
Reply to comment by jiamengial in [D] What's stopping you from working on speech and voice? by jiamengial
Have you tried https://github.com/scart97/thunder-speech? It's a smaller repo that is based off of NeMo, but meant to be for more flexible experimentation, and is compatible with huggingface's transformers library.
pronunciaai t1_j6l3lzv wrote
Reply to comment by TheCoconutTree in [D] Simple Questions Thread by AutoModerator
Your suggested approach is the correct one and is called "one-hot encoding". Your thinking about why an embedding (single learned value) is inappropriate is also accurate.
pronunciaai t1_j7vlt9i wrote
Reply to comment by CeFurkan in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
Yes dns64 is better/larger than their other pretrained (48 I think)