ChocolateFit9026 t1_iqrrbt4 wrote on October 2, 2022 at 4:58 PM

There’s a big misunderstanding that just because text to image is huge right now that everything will be done with text prompts. The only reason it works this way is because good image text pairs exist everywhere on the internet. Not the same with audio and lots of things.

Ohigetjokes OP t1_iqsn7xn wrote on October 2, 2022 at 8:13 PM

It's a convenient lens for the larger issue

ChocolateFit9026 t1_iqvh2bk wrote on October 3, 2022 at 12:12 PM

What’s the larger issue? It seems like if large ML models are trained in particular data and work a particular way (usually without text prompts), there isn’t any “speaking to machines” translation issue.