Submitted by Jordan117 t3_yw1uxc in singularity
SuperSpaceEye t1_iwht6hf wrote
Reply to comment by Jordan117 in ELI5: Why such a big difference in compute cost for different types of media? by Jordan117
Two different tasks. Language model in SD just encodes text to some abstract representation that diffusion part of the model then uses. Text-to-text model such as GPT-J does different task which is much harder. Also, GPT-J is 6B parameters, which will only take like 12GB or VRAM, not hundreds.
Jordan117 OP t1_iwhtnxu wrote
Thanks for the clarification, I must have misread an older post talking about CPU memory requirements instead of GPU.
Viewing a single comment thread. View all comments