pronunciaai t1_j7vlt9i wrote on February 9, 2023 at 6:51 PM

Reply to comment by CeFurkan in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan

Yes dns64 is better/larger than their other pretrained (48 I think)

pronunciaai t1_j6mqq5n wrote on January 31, 2023 at 12:58 PM

Reply to [P] Fine Tuning Whisper in another language by ruizard

Huggingface just finished a sprint where they fine-tuned whisper on 100s of languages, going to their discord and following the guides is going to be by far the easiest way.

Check under ML-4-AUDIO channels "sprint-announcement", "discussions", and "whisper-model-playground"

pronunciaai t1_j6l49ij wrote on January 31, 2023 at 2:20 AM

Reply to comment by blackkettle in [D] What's stopping you from working on speech and voice? by jiamengial

Yeah I work in the space (mispronunciation detection) and there is not a lack of frameworks, (speechbrain, NeMo, and thunder-speech being the more useful ones for custom stuff imo). The barrier to entry is all the stuff you have to learn to do audio ML, and all the pain points around stuff like CTC. Tutorials are more needed than frameworks to get more people actively working on speech and voice in my opinion.

pronunciaai t1_j6l3unb wrote on January 31, 2023 at 2:17 AM

Reply to comment by jiamengial in [D] What's stopping you from working on speech and voice? by jiamengial

Have you tried https://github.com/scart97/thunder-speech? It's a smaller repo that is based off of NeMo, but meant to be for more flexible experimentation, and is compatible with huggingface's transformers library.

pronunciaai t1_j6l3lzv wrote on January 31, 2023 at 2:15 AM

Reply to comment by TheCoconutTree in [D] Simple Questions Thread by AutoModerator

Your suggested approach is the correct one and is called "one-hot encoding". Your thinking about why an embedding (single learned value) is inappropriate is also accurate.