CKtalon
CKtalon t1_j4enpew wrote
Reply to [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691
GPT2 was trained on a different dataset, with little code (other than those obtained from the CommonCrawl). GPT Neo uses The Pile which contains a lot of code.
CKtalon t1_j16qtog wrote
Reply to comment by caedin8 in [D] Running large language models on a home PC? by Zondartul
Training will at minimum need about 10x more resources than what I said (inferencing). And that’s just to fit the model and all its optimisation weights with batch size 1.
CKtalon t1_j13dg5b wrote
Just forget about it.
Yes, it's possible to do it on CPU/RAM (Threadripper builds with > 256GB RAM + some assortment of 2x-4x GPUs), but the speed is so slow that it's pointless working with it. Deepspeed or Hugging Face can spread it out between GPU and CPU, but even so, it will be stupid slow, probably MINUTES per token.
We are at least 5 years away before consumer hardware can run 175+B models on a single machine (4 GPUs in a single machine).
20B models are in the realm of consumer hardware (3090/4090) with INT8, though slow, but still possible.
CKtalon t1_iy432bm wrote
Reply to [D] Training LLMs collaboratively by dogonix
Check out PETAL. https://arxiv.org/pdf/2209.01188.pdf
CKtalon t1_iy2n7t0 wrote
Reply to Best GPU for deep learning by somebodyenjoy
Get the 4090. Besides you only have 32GB ram. Feeding 2 GPUs with data can be a bottleneck.
CKtalon t1_iy2n56h wrote
Reply to comment by --dany-- in Best GPU for deep learning by somebodyenjoy
NVLink doesn’t pool VRAM no matter what Nvidia’s marketing says. I have NVLink. It just doesn’t.
CKtalon t1_iy1mmlw wrote
Reply to Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862
A6000 is almost 2 years old. The newer version the RTX 6000 (yes confusing naming convention) is coming out in about 3 months time, although it might not be easy to get your hands on one.
CKtalon t1_iwqk0b9 wrote
Reply to comment by ChuckSeven in [R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) by bo_peng
It’s written in the 2nd column (params)
CKtalon t1_j5tvh2p wrote
Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
Google supposedly has better models based on benchmarks, but few people outside of Google has used them (and those that have used it don't seem to be giving good reviews).
AnthropicAI's Claude model seems promising as a ChatGPT competitor.