vman512
vman512 t1_j25g2ca wrote
Reply to comment by shadowknight094 in [P] We finally got Text-to-PowerPoint working!! (Generative AI for Slides ✨) by Mastersulm
It's a joke, "Boston Consulting Group", but also "Bi-modal Conditional Generation"
vman512 t1_izpdpeh wrote
Reply to comment by abloblololo in [D] A talk about ChatGPT by [deleted]
Perhaps they meant this: https://twitter.com/BryanCa87413332/status/1601010538364162056?t=_0Se-AdTPQKliEMD1RR8Tw&s=19
vman512 t1_itj1y28 wrote
Reply to comment by SSC_08 in [P] is it necessary to convert audio data from analog to digital? by SSC_08
Any software that processes audio uses a digital representation of audio. Only when designing circuits would you ever be dealing with analog signals, for example a guitar amp.
You may be confusing analog/digital with the concept of time-domain (waveforms) vs frequency domain (spectrograms).
vman512 t1_itiy510 wrote
Reply to comment by blablanonymous in [P] is it necessary to convert audio data from analog to digital? by SSC_08
OP must have built their own neuromorphic chip
vman512 t1_irw4qjv wrote
I think the most straightforward way to solve this is to generate a dataset of text->image with the diffusion model, and then learn the inverse function with a new model. But you'd need a gigantic dataset for this to work.
Diffusion models have quite diverse outputs, even given the same prompt. Maybe what your asking for is, given an image, and a random seed, design a prompt that replicates the image as close as possible?
In that case, you can imagine each image->text inference as an optimization problem, and use a deep-dream style loss to optimize for the best prompt. It may be helpful to first use this method to select best latent encoding of the text, and then figure out how to learn the inverse function for the text embedding
vman512 t1_j96s8yu wrote
Reply to comment by Flag_Red in [R] neural cloth simulation by LegendOfHiddnTempl
maybe for people who play video games all day, this is the most real life use case