I'm planning to see how a latent diffusion model would perform in the image reconstruction from brain activity task. Specifically, the image generation would be conditioned on brain activity instead of text. Has anyone tried conditioning on brain activity or other information apart from text? I'm having a hard time digesting the code from the LDM repo and was wondering if anyone has tried coding it (or a simpler version) from scratch.

Comments

You must log in or register to comment.

acertainmoment t1_ivaloxt wrote on November 6, 2022 at 3:48 PM

Most likely you won’t need to code everything from scratch. You’ll probably just need to add an nn.Linear or a 1x1 conv to convert whatever dimension your brain activity data is into the dimension of the tensor that it is currently conditioned on (I think it’s 1024 or 2048 dim embeddings currently not exactly sure)

shawarma_bees t1_ivaa8if wrote on November 6, 2022 at 2:28 PM

How is the “brain activity” information encoded?

johnnydaggers t1_ivb869u wrote on November 6, 2022 at 6:17 PM

It’s trivially easy. The problem is getting enough training data for something like that.

king_of_walrus t1_ivbdzd4 wrote on November 6, 2022 at 6:53 PM

I don’t think it’s that easy… do you have brain activity/image pairs for training? If so, do you have a lot of them?

To train the conditional models, and you need pairs of targets and conditioning info. Also, the code is a lot to digest. I would suggest looking in ldm/models/diffusion/ddpm.py to see how things work. You can clearly see all diffusion related code and the training logic. It may help your understanding.

bloc97 t1_ivbmwbb wrote on November 6, 2022 at 7:50 PM

I'm just guessing, but it's probably pairs of visual cortex activations with images seen by an animal (maybe mice)...

ajin-wolf t1_ivc2wxf wrote on November 6, 2022 at 9:30 PM

Someone did this with UK Biobank data (very large sample) although I think it would be more interesting to use the Child Mind Institute HBN dataset https://arxiv.org/abs/2209.07162

elbiot t1_ivdnwpb wrote on November 7, 2022 at 4:51 AM

Latent diffusion works with text because Clip was trained on millions of pairs of text and image already. You've got a huge project of training millions of brain activity/text pairs ahead of you

[deleted] t1_ivbb58c wrote on November 6, 2022 at 6:35 PM

[deleted]

le_theudas t1_ivbztjx wrote on November 6, 2022 at 9:10 PM

Doing Something similar, use k-diffusion. You can either use the Cross Attention Inputs or add a Network component where you need it

Traditional_Tale_748 t1_ive0vsq wrote on November 7, 2022 at 7:26 AM

If I were you I would a convert brain activity into a graph then into an image. Then you can use that as a condition input for standard diffusion model.