Submitted by yamakeeen t3_ynq05k in MachineLearning

I'm planning to see how a latent diffusion model would perform in the image reconstruction from brain activity task. Specifically, the image generation would be conditioned on brain activity instead of text. Has anyone tried conditioning on brain activity or other information apart from text? I'm having a hard time digesting the code from the LDM repo and was wondering if anyone has tried coding it (or a simpler version) from scratch.

18

Comments

You must log in or register to comment.

acertainmoment t1_ivaloxt wrote

Most likely you won’t need to code everything from scratch. You’ll probably just need to add an nn.Linear or a 1x1 conv to convert whatever dimension your brain activity data is into the dimension of the tensor that it is currently conditioned on (I think it’s 1024 or 2048 dim embeddings currently not exactly sure)

16

shawarma_bees t1_ivaa8if wrote

How is the “brain activity” information encoded?

7

johnnydaggers t1_ivb869u wrote

It’s trivially easy. The problem is getting enough training data for something like that.

3

king_of_walrus t1_ivbdzd4 wrote

I don’t think it’s that easy… do you have brain activity/image pairs for training? If so, do you have a lot of them?

To train the conditional models, and you need pairs of targets and conditioning info. Also, the code is a lot to digest. I would suggest looking in ldm/models/diffusion/ddpm.py to see how things work. You can clearly see all diffusion related code and the training logic. It may help your understanding.

3

bloc97 t1_ivbmwbb wrote

I'm just guessing, but it's probably pairs of visual cortex activations with images seen by an animal (maybe mice)...

2

ajin-wolf t1_ivc2wxf wrote

Someone did this with UK Biobank data (very large sample) although I think it would be more interesting to use the Child Mind Institute HBN dataset https://arxiv.org/abs/2209.07162

3

elbiot t1_ivdnwpb wrote

Latent diffusion works with text because Clip was trained on millions of pairs of text and image already. You've got a huge project of training millions of brain activity/text pairs ahead of you

2

le_theudas t1_ivbztjx wrote

Doing Something similar, use k-diffusion. You can either use the Cross Attention Inputs or add a Network component where you need it

1

Traditional_Tale_748 t1_ive0vsq wrote

If I were you I would a convert brain activity into a graph then into an image. Then you can use that as a condition input for standard diffusion model.

1