Submitted by yamakeeen t3_ynq05k in MachineLearning
I'm planning to see how a latent diffusion model would perform in the image reconstruction from brain activity task. Specifically, the image generation would be conditioned on brain activity instead of text. Has anyone tried conditioning on brain activity or other information apart from text? I'm having a hard time digesting the code from the LDM repo and was wondering if anyone has tried coding it (or a simpler version) from scratch.
acertainmoment t1_ivaloxt wrote
Most likely you won’t need to code everything from scratch. You’ll probably just need to add an nn.Linear or a 1x1 conv to convert whatever dimension your brain activity data is into the dimension of the tensor that it is currently conditioned on (I think it’s 1024 or 2048 dim embeddings currently not exactly sure)