Available_Lion_652
Available_Lion_652 t1_jckrfwd wrote
Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]
I don t understand why you insulted me. I really tried to wrote a post about a case where GPT 4 hallucinate s, with all good intentions, but I guess you have to be a smartass
Available_Lion_652 t1_jck2th4 wrote
Reply to comment by olmec-akeru in [D] GPT-4 is really dumb by [deleted]
Not quite :). The second operation (a + b + c)^2014 = a^2014 + b^2014 + c^2014 is false. It does not understand complex math operations. To be sincere solving the above problem means it can do better math than most humans.
Available_Lion_652 t1_jcjxja6 wrote
Reply to comment by JaCraig in [D] GPT-4 is really dumb by [deleted]
This is a 5 the grade math Olympiad problem. Sorry for not mentioning it. Good luck if you can resolve it with a basic app to calculate it
Available_Lion_652 t1_jcjxe16 wrote
Reply to comment by SQLGene in [D] GPT-4 is really dumb by [deleted]
Good remarks. This is my first post on this reddit. I didn't know what title to give. I was angry at ClosedAI for not revealing models details and dataset details.
Available_Lion_652 t1_jcjwqg0 wrote
Reply to comment by SQLGene in [D] GPT-4 is really dumb by [deleted]
Probably I have to specify. The problem that I give to GPT 4 to solve was a 5tg grade math Olympiad problem. You re statement is unfounded
Available_Lion_652 t1_jcjwf9c wrote
Reply to comment by olmec-akeru in [D] GPT-4 is really dumb by [deleted]
Yes, that was interesting, :) but it failed at adding operatios
Available_Lion_652 t1_jcjw5cf wrote
Reply to comment by kaoD in [D] GPT-4 is really dumb by [deleted]
I understood the post really well. My comment was an augmentation. I think you did not understand what I said
Available_Lion_652 t1_jcjuxp5 wrote
Reply to comment by yumiko14 in [D] GPT-4 is really dumb by [deleted]
Is not an article. Someone on Twitter estimated the total compute power based on a report that Microsoft had 25k A100 GPU racks. That was all
Available_Lion_652 t1_jcjukim wrote
Reply to comment by PM_ME_ENFP_MEMES in [D] GPT-4 is really dumb by [deleted]
Yes, there is currently a fix for this problem. In Llamas paper they splited numbers into digits 12345 became 1 2 3 4 5 29 December became 2 9 December.
It helps with addition, subtracting but not with complex reasoning
Available_Lion_652 t1_jcjtc6h wrote
Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]
I don t understand why people down voted. I saw a claim that GPT 4 was trained on 25k Nvidia A100 for several months. It has used x100 more compute power than GPT3, based on that post. 20 B Llama model was trained on 1.4 trillions tokens. So yeah, I think that my post is based on these claims
Available_Lion_652 t1_jcjt3yi wrote
Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]
The tokenizer of Llama from Facebook splits numbers into digits such that the model is better at math calculations. The question that I asked the model is more than adding or subtracting numbers. The model must understand what a perfect cube is, which it does, but also it must not hallucinate when reasoning, which it fails at
Available_Lion_652 t1_jcjrfnx wrote
Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]
I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates
Available_Lion_652 t1_ja7rsg9 wrote
Reply to [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
It s obvious. Because they memorize text
Available_Lion_652 OP t1_j81ce54 wrote
Reply to comment by ehlen in [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652
Thank you for your effort
Available_Lion_652 OP t1_j80lcuv wrote
Reply to comment by ehlen in [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652
I would really appreciate it if you can try to finetune a T5-XXL Flan model from Huggingface on your hardware. I am curious if it works and if there is a big bottleneck. Thank you
Available_Lion_652 OP t1_j7ue7pj wrote
Reply to comment by IntelArtiGen in [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652
My motherboard is quite old and the best CPU that I can attach yo it is a i7 7700k. From what I have read, if I will process the dataset before training, than it should not bottleneck. But what I was think was that the preprocessed dataset is held in 32 GB of RAM. The CPU transfers data from RAM to GPU memory. It has only 8 threads. Let s say I want to train from scratch a GPT2. I do not know exactly how much the CPU/RAM frequency will bottleneck the training process. I fon t want to change my whole hardware. If 3090 RTX is to performant and the bottleneck is to high, I was wondering if I can buy a 3060/3080
Submitted by Available_Lion_652 t3_10xu09v in MachineLearning
Available_Lion_652 t1_irgkega wrote
I am on a niche with NLP, transformers, GPT, BERT, T5, but I totally ignored diffusion models(yes, they are cool, but very hard to apply only to text) I also ignore mostly all things related to CV. I ve also worked with GNN, they intersect with transformers, and I want to start study RL
Available_Lion_652 t1_jcm8ub5 wrote
Reply to [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Some heroes don't wear a cape