moschles
moschles t1_j8vnlzr wrote
A new paradigm, earth + plastic.
moschles t1_j62iaxi wrote
Reply to [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo
GANs produce an image "cut from the whole cloth" at once.
Diffusion models are using a trick -- wherein between rounds of incremental noise removal, they perform a super resolution round.
Technically speaking, you could start from GAN output, and then take it through rounds of super-resolution. The result would look a lot like what diffusion models produce. This leaves a question as to how the new details would be guided, ( or more technically), what the super resolution features would be conditioned upon. If you are going to condition them on text embeddings, you might as well condition the whole process on the same embedding . . . now you just have a diffusion model.
A second weakness of GANs is the narrowness of their variety. When made to produce vectors corresponding to a category "dog" , they tend to produce to nearly exactly the same dog each time.
moschles t1_j58krio wrote
Reply to comment by wwarnout in TIL The famous "rods from god" concept of a space-based weapons system of orbiting tungsten rods was developed by science fiction writer Jerry Pournelle. by BitterFuture
Can't get rods into position in any reasonable time. Impossible to aim them at targets. Etc.
moschles t1_j4nczb1 wrote
Reply to comment by pm_me_your_pay_slips in [D] Bitter lesson 2.0? by Tea_Pearce
Or worse, is "Foundation Model" just a contemporary buzzword replacement for unsupervised training?
moschles t1_j4nch5w wrote
Reply to [D] Bitter lesson 2.0? by Tea_Pearce
> Seems to be derived by observing that the most promising work in robotics today (where generating data is challenging) is coming from piggy-backing on the success of large language models (think SayCan etc).
There is nothing really magical being claimed here. The LLMs are undergoing unsupervised training. essentially by creating distortions of the text. (one type of "distortion" is Cloze Deletion. But there are others in the panoply of distorted text.)
Unsupervised training avoids the bottleneck of having to manually pre-label your dataset.
When we translate unsupervised training to the robotics domain, what does that look like? Perhaps "next word prediction" is analogous to "next second prediction" of a physical environment. And Cloze Deletion has an analogy to probabilistic "in-painting" done by existing diffusion models.
That's the way I see it. I'm not particular sold on this idea that the pretraining would be literal LLM trained on text, ported seamlessly to the robotics domain. If I'm wrong, set me straight.
moschles t1_izhydos wrote
Reply to [R] What the DAAM: Interpreting Stable Diffusion and Uncovering Generation Entanglement by tetrisdaemon
> To our knowledge, we are the first to interpret large diffusion models from a visuolinguistic perspective, which enables future lines of research.
It seems like the lines of research here would be automated photo captioning.
moschles t1_iqtccm9 wrote
Reply to [D] Types of Machine Learning Papers by Lost-Parfait568
Some feelings were hurt by this meme.
moschles t1_jcsuj2u wrote
Reply to Bacteria in recalled eye drops linked to cases of vision loss, surgical removal of eyeballs by iamthyfucker
As a user of eyedrops like this, I'm disappointed in CNN being sketchy about which brands are part of the recall. They only say "call this number".