dancingnightly t1_j76t0gh wrote
Reply to comment by zbyte64 in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
In theory training T5 alongiside the image embedding models they use (primarily DETR?) shouldn't take much more than a 3090 or Collab Pro GPU. You could train T5s on even consumer high end GPUs in 2020, for example, but the DETR image model probably needs to be ran for each image at the same time which might take up quite a bit of GPU together. The `main.py` script looks like a nice and fairly short typical training script you'd be able to quickly run if you download their repo, pull the scienceQA dataset and send the training args to see if it crashes.
Viewing a single comment thread. View all comments