Submitted by cloneofsimo t3_ykiuq0 in MachineLearning
Hi. Today I've came across this interesting paper https://arxiv.org/abs/2210.16056 that proposes interesting method to combine semantics of text and image in diffusion process.
In short, this mixes "layout" with "content", however unlike style transfer,
>"...semantic mixing aims to fuse multiple semantics into one single object."
I was surprised by the examples they showed, so I wanted to try it but the code wasn't available. I've implemented the method myself, and I wanted to share it here!
https://github.com/cloneofsimo/magicmix
Layout of \"realistic photo of a rabbit\" with content of \"tiger\"
I hope my implementation helps who is reading the paper!
Note: I'm not the author of the paper, and this is not an official implementation
Inevitable-Ad8503 t1_iuth7a3 wrote
Nice. Thanks for taking the time to implement and to share. I get the feeling that this approach will be way more coherent for certain types of content and layout (mayhaps when prompting something like “a tiger and a bunny sitting side by side” and the “traditional” (is it old enough to have traditions yet?) approach will be better for other types, perhaps such as “a tiger and a bunny”, where the intent is what that approach often results in, some interpolation between the two objects; a chimera or mutant.
Or maybe another way to explain is that this MagixMix is akin to orchestration of multiple parts, and the present approach is more akin to a mashup of multiple parts; each has its own strengths and appropriate applications.