There are tens of companies proposing their text-to-image models:

NightCafe, Dream by WOMBO, DALL-E 2, Midjourney, Stable Diffusion, StabilityAI ...etc

What are the different architectures they use ? or do they only differ on training datasets ?

Comments

You must log in or register to comment.

currentscurrents t1_jai5dk2 wrote on March 1, 2023 at 4:41 PM

#2,138,670

Basically all of the text-to-image generators available today are diffusion models based around convolutional U-Nets. Google has an (unreleased) one that uses vision transformers.

There is more variety in the text encoder, which turns out to be more important than the diffuser. CLIP is very popular, but large language models like T5 show better performance and are probably the future.

ninjasaid13 t1_jaj678q wrote on March 1, 2023 at 8:32 PM

#2,140,081

Replying to currentscurrents (#2,138,670)

>T5

but isn't it much heavier?

currentscurrents t1_jaj8jze wrote on March 1, 2023 at 8:47 PM

#2,140,164

Replying to ninjasaid13 (#2,140,081)

Yup. But in neural networks, bigger is better!

ninjasaid13 t1_jajamez wrote on March 1, 2023 at 8:59 PM

#2,140,260

Replying to currentscurrents (#2,140,164)

but in industry, don't we want things to be cheap? cost might be a bigger factor than performance.

currentscurrents t1_jajh007 wrote on March 1, 2023 at 9:38 PM

#2,140,484

Replying to ninjasaid13 (#2,140,260)

That's always a balance you'll have to make. You can only run what fits on your available hardware.

bjergerk1ng t1_jakt9hi wrote on March 2, 2023 at 3:22 AM

#2,142,384

Replying to currentscurrents (#2,138,670)

Source about Google using ViT?

ninjasaid13 t1_jal8x3x wrote on March 2, 2023 at 5:45 AM

#2,143,023

Replying to currentscurrents (#2,140,484)

>You can only run what fits on your available hardware.

Precisely.

xEdwin23x t1_jals78j wrote on March 2, 2023 at 9:55 AM

#2,143,594

Replying to bjergerk1ng (#2,142,384)

I'm guessing he refers to this one: https://parti.research.google/

bjergerk1ng t1_jalzy46 wrote on March 2, 2023 at 11:38 AM

#2,143,814

Replying to xEdwin23x (#2,143,594)

That's not diffusion though

AImSamy OP t1_jbacun0 wrote on March 7, 2023 at 4:51 PM

#2,179,233

Replying to currentscurrents (#2,138,670)

Thanks a lot for the reply.
Do you have documentation for that ?