Available_Lion_652

Available_Lion_652 t1_jcjtc6h wrote

I don t understand why people down voted. I saw a claim that GPT 4 was trained on 25k Nvidia A100 for several months. It has used x100 more compute power than GPT3, based on that post. 20 B Llama model was trained on 1.4 trillions tokens. So yeah, I think that my post is based on these claims

0

Available_Lion_652 t1_jcjt3yi wrote

The tokenizer of Llama from Facebook splits numbers into digits such that the model is better at math calculations. The question that I asked the model is more than adding or subtracting numbers. The model must understand what a perfect cube is, which it does, but also it must not hallucinate when reasoning, which it fails at

0

Available_Lion_652 t1_jcjrfnx wrote

I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates

−13

Available_Lion_652 OP t1_j7ue7pj wrote

My motherboard is quite old and the best CPU that I can attach yo it is a i7 7700k. From what I have read, if I will process the dataset before training, than it should not bottleneck. But what I was think was that the preprocessed dataset is held in 32 GB of RAM. The CPU transfers data from RAM to GPU memory. It has only 8 threads. Let s say I want to train from scratch a GPT2. I do not know exactly how much the CPU/RAM frequency will bottleneck the training process. I fon t want to change my whole hardware. If 3090 RTX is to performant and the bottleneck is to high, I was wondering if I can buy a 3060/3080

1