Available_Lion_652 t1_jcm8ub5 wrote on March 17, 2023 at 9:14 PM

Reply to [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Some heroes don't wear a cape

Available_Lion_652 t1_jckrfwd wrote on March 17, 2023 at 3:29 PM

Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]

I don t understand why you insulted me. I really tried to wrote a post about a case where GPT 4 hallucinate s, with all good intentions, but I guess you have to be a smartass

Available_Lion_652 t1_jck2th4 wrote on March 17, 2023 at 12:23 PM

Reply to comment by olmec-akeru in [D] GPT-4 is really dumb by [deleted]

Not quite :). The second operation (a + b + c)^2014 = a^2014 + b^2014 + c^2014 is false. It does not understand complex math operations. To be sincere solving the above problem means it can do better math than most humans.

Available_Lion_652 t1_jcjxja6 wrote on March 17, 2023 at 11:32 AM

Reply to comment by JaCraig in [D] GPT-4 is really dumb by [deleted]

This is a 5 the grade math Olympiad problem. Sorry for not mentioning it. Good luck if you can resolve it with a basic app to calculate it

Available_Lion_652 t1_jcjxe16 wrote on March 17, 2023 at 11:30 AM

Reply to comment by SQLGene in [D] GPT-4 is really dumb by [deleted]

Good remarks. This is my first post on this reddit. I didn't know what title to give. I was angry at ClosedAI for not revealing models details and dataset details.

Available_Lion_652 t1_jcjwqg0 wrote on March 17, 2023 at 11:23 AM

Reply to comment by SQLGene in [D] GPT-4 is really dumb by [deleted]

Probably I have to specify. The problem that I give to GPT 4 to solve was a 5tg grade math Olympiad problem. You re statement is unfounded

Available_Lion_652 t1_jcjwf9c wrote on March 17, 2023 at 11:20 AM

Reply to comment by olmec-akeru in [D] GPT-4 is really dumb by [deleted]

Yes, that was interesting, :) but it failed at adding operatios

Available_Lion_652 t1_jcjw5cf wrote on March 17, 2023 at 11:17 AM

Reply to comment by kaoD in [D] GPT-4 is really dumb by [deleted]

I understood the post really well. My comment was an augmentation. I think you did not understand what I said

Available_Lion_652 t1_jcjuxp5 wrote on March 17, 2023 at 11:03 AM

Reply to comment by yumiko14 in [D] GPT-4 is really dumb by [deleted]

Is not an article. Someone on Twitter estimated the total compute power based on a report that Microsoft had 25k A100 GPU racks. That was all

Available_Lion_652 t1_jcjukim wrote on March 17, 2023 at 10:59 AM

Reply to comment by PM_ME_ENFP_MEMES in [D] GPT-4 is really dumb by [deleted]

Yes, there is currently a fix for this problem. In Llamas paper they splited numbers into digits 12345 became 1 2 3 4 5 29 December became 2 9 December.

It helps with addition, subtracting but not with complex reasoning

Available_Lion_652 t1_jcjtc6h wrote on March 17, 2023 at 10:44 AM

Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]

I don t understand why people down voted. I saw a claim that GPT 4 was trained on 25k Nvidia A100 for several months. It has used x100 more compute power than GPT3, based on that post. 20 B Llama model was trained on 1.4 trillions tokens. So yeah, I think that my post is based on these claims

Available_Lion_652 t1_jcjt3yi wrote on March 17, 2023 at 10:41 AM

Reply to comment by NotARedditUser3 in [D] GPT-4 is really dumb by [deleted]

The tokenizer of Llama from Facebook splits numbers into digits such that the model is better at math calculations. The question that I asked the model is more than adding or subtracting numbers. The model must understand what a perfect cube is, which it does, but also it must not hallucinate when reasoning, which it fails at

Available_Lion_652 t1_jcjrfnx wrote on March 17, 2023 at 10:20 AM

Reply to comment by Single_Blueberry in [D] GPT-4 is really dumb by [deleted]

I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates

Available_Lion_652 t1_ja7rsg9 wrote on February 27, 2023 at 2:04 PM

Reply to [R] Large language models generate functional protein sequences across diverse families by MysteryInc152

It s obvious. Because they memorize text

Available_Lion_652 OP t1_j81ce54 wrote on February 10, 2023 at 10:11 PM

Reply to comment by ehlen in [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652

Thank you for your effort

Available_Lion_652 OP t1_j80lcuv wrote on February 10, 2023 at 7:13 PM

Reply to comment by ehlen in [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652

I would really appreciate it if you can try to finetune a T5-XXL Flan model from Huggingface on your hardware. I am curious if it works and if there is a big bottleneck. Thank you

Available_Lion_652 OP t1_j7ue7pj wrote on February 9, 2023 at 2:06 PM

Reply to comment by IntelArtiGen in [D] RTX 3090 with i7 7700k, training bottleneck by Available_Lion_652

My motherboard is quite old and the best CPU that I can attach yo it is a i7 7700k. From what I have read, if I will process the dataset before training, than it should not bottleneck. But what I was think was that the preprocessed dataset is held in 32 GB of RAM. The CPU transfers data from RAM to GPU memory. It has only 8 threads. Let s say I want to train from scratch a GPT2. I do not know exactly how much the CPU/RAM frequency will bottleneck the training process. I fon t want to change my whole hardware. If 3090 RTX is to performant and the bottleneck is to high, I was wondering if I can buy a 3060/3080

Available_Lion_652 t1_irgkega wrote on October 7, 2022 at 11:14 PM

Reply to [D] Giving Up on Staying Up to Date and Splitting the Field by beezlebub33

I am on a niche with NLP, transformers, GPT, BERT, T5, but I totally ignored diffusion models(yes, they are cool, but very hard to apply only to text) I also ignore mostly all things related to CV. I ve also worked with GNN, they intersect with transformers, and I want to start study RL