Viewing a single comment thread. View all comments

bhagy7 t1_jc8rdbj wrote

Yes, it is possible to train a small diffusion model conditioned on text captions from scratch on 64x64 images or even smaller. Depending on the complexity of the model and the number of GPUs you are using, it could take anywhere from a few hours to several days. If you are

1