hosjiu
hosjiu t1_jcjey3z wrote
Reply to comment by learn-deeply in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234
sure, but its main focus is to try to help many people in the academic community out there can do “pretraining phase” by themeself for fast, cheap and reproducible research experiments.
hosjiu t1_isi4pyg wrote
Reply to comment by rmsisme in [R] UL2: Unifying Language Learning Paradigms - Google Research 2022 - 20B parameters outperforming 175B GTP-3 and tripling the performance of T5-XXl on one-shot summarization. Public checkpoints! by Singularian2501
the same point of view with u.
hosjiu t1_jd1a6az wrote
Reply to comment by Civil_Collection7267 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
"They also have the tendency to hallucinate frequently unless parameters are made more restrictive."
I am not really understand this point in term of technical