Viewing a single comment thread. View all comments

Jordan117 OP t1_iwhs5cw wrote

Is there a reason the language model part of image diffusion requires a lot less horsepower than running a language model by itself? I'm still amazed SD works quickly on my 2016-era PC, but apparently something like GPT-J requires dozens or hundreds of GB of memory to even store. Is it the difference between generating new text vs. working with existing text?

2

SuperSpaceEye t1_iwht6hf wrote

Two different tasks. Language model in SD just encodes text to some abstract representation that diffusion part of the model then uses. Text-to-text model such as GPT-J does different task which is much harder. Also, GPT-J is 6B parameters, which will only take like 12GB or VRAM, not hundreds.

3

Jordan117 OP t1_iwhtnxu wrote

Thanks for the clarification, I must have misread an older post talking about CPU memory requirements instead of GPU.

2