CKtalon t1_j5tvh2p wrote on January 25, 2023 at 3:14 PM

Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle

Google supposedly has better models based on benchmarks, but few people outside of Google has used them (and those that have used it don't seem to be giving good reviews).

AnthropicAI's Claude model seems promising as a ChatGPT competitor.

CKtalon t1_j4enpew wrote on January 15, 2023 at 4:14 AM

Reply to [D] Is there any reason hugging face GPT2 would behave (fundamentally) differently from GPT-Neo? by GasZealousideal8691

GPT2 was trained on a different dataset, with little code (other than those obtained from the CommonCrawl). GPT Neo uses The Pile which contains a lot of code.

CKtalon t1_j16qtog wrote on December 22, 2022 at 1:55 AM

Reply to comment by caedin8 in [D] Running large language models on a home PC? by Zondartul

Training will at minimum need about 10x more resources than what I said (inferencing). And that’s just to fit the model and all its optimisation weights with batch size 1.

CKtalon t1_j13dg5b wrote on December 21, 2022 at 11:06 AM

Reply to [D] Running large language models on a home PC? by Zondartul

Just forget about it.

Yes, it's possible to do it on CPU/RAM (Threadripper builds with > 256GB RAM + some assortment of 2x-4x GPUs), but the speed is so slow that it's pointless working with it. Deepspeed or Hugging Face can spread it out between GPU and CPU, but even so, it will be stupid slow, probably MINUTES per token.

We are at least 5 years away before consumer hardware can run 175+B models on a single machine (4 GPUs in a single machine).

20B models are in the realm of consumer hardware (3090/4090) with INT8, though slow, but still possible.

CKtalon t1_iy432bm wrote on November 28, 2022 at 4:33 PM

Reply to [D] Training LLMs collaboratively by dogonix

Check out PETAL. https://arxiv.org/pdf/2209.01188.pdf

CKtalon t1_iy2n7t0 wrote on November 28, 2022 at 7:25 AM

Reply to Best GPU for deep learning by somebodyenjoy

Get the 4090. Besides you only have 32GB ram. Feeding 2 GPUs with data can be a bottleneck.

CKtalon t1_iy2n56h wrote on November 28, 2022 at 7:24 AM

Reply to comment by --dany-- in Best GPU for deep learning by somebodyenjoy

NVLink doesn’t pool VRAM no matter what Nvidia’s marketing says. I have NVLink. It just doesn’t.

CKtalon t1_iy1mmlw wrote on November 28, 2022 at 1:35 AM

Reply to Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862

A6000 is almost 2 years old. The newer version the RTX 6000 (yes confusing naming convention) is coming out in about 3 months time, although it might not be easy to get your hands on one.

CKtalon t1_iwqk0b9 wrote on November 17, 2022 at 4:37 PM

Reply to comment by ChuckSeven in [R] RWKV-4 7B release: an attention-free RNN language model matching GPT-J performance (14B training in progress) by bo_peng

It’s written in the 2nd column (params)