Submitted by Dr_Singularity t3_xu0oos in singularity
space_spider t1_iqum8oo wrote
Reply to comment by Nmanga90 in Large Language Models Can Self-improve by Dr_Singularity
This is close to nvidia’s megatron parameter count: https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/
It’s also the same as PaLM: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html?m=1
This approach (chain of thought) has been discussed for a few months at least, so I think this could be a legit paper from nvidia or google
Viewing a single comment thread. View all comments