Jordan117 OP t1_iwhs5cw wrote on November 15, 2022 at 7:05 PM

Reply to comment by SuperSpaceEye in ELI5: Why such a big difference in compute cost for different types of media? by Jordan117

Is there a reason the language model part of image diffusion requires a lot less horsepower than running a language model by itself? I'm still amazed SD works quickly on my 2016-era PC, but apparently something like GPT-J requires dozens or hundreds of GB of memory to even store. Is it the difference between generating new text vs. working with existing text?

SuperSpaceEye t1_iwht6hf wrote on November 15, 2022 at 7:12 PM

Two different tasks. Language model in SD just encodes text to some abstract representation that diffusion part of the model then uses. Text-to-text model such as GPT-J does different task which is much harder. Also, GPT-J is 6B parameters, which will only take like 12GB or VRAM, not hundreds.

Jordan117 OP t1_iwhtnxu wrote on November 15, 2022 at 7:15 PM

Thanks for the clarification, I must have misread an older post talking about CPU memory requirements instead of GPU.