Submitted by Jordan117 t3_yw1uxc in singularity
Jordan117 OP t1_iwhs5cw wrote
Reply to comment by SuperSpaceEye in ELI5: Why such a big difference in compute cost for different types of media? by Jordan117
Is there a reason the language model part of image diffusion requires a lot less horsepower than running a language model by itself? I'm still amazed SD works quickly on my 2016-era PC, but apparently something like GPT-J requires dozens or hundreds of GB of memory to even store. Is it the difference between generating new text vs. working with existing text?
SuperSpaceEye t1_iwht6hf wrote
Two different tasks. Language model in SD just encodes text to some abstract representation that diffusion part of the model then uses. Text-to-text model such as GPT-J does different task which is much harder. Also, GPT-J is 6B parameters, which will only take like 12GB or VRAM, not hundreds.
Jordan117 OP t1_iwhtnxu wrote
Thanks for the clarification, I must have misread an older post talking about CPU memory requirements instead of GPU.
Viewing a single comment thread. View all comments