fiftyfourseventeen t1_je1gprd wrote on March 28, 2023 at 6:51 PM

Reply to comment by utopiah in [D] FOMO on the rapid pace of LLMs by 00001746

If you just want to change the output of a model to look more like something else in its training data, sure. LoRa trains the attention layers (technically it trains a separate model but it can be merged into the attention layers), so it doesn't necessarily add anything NEW per se, but rather focuses on things the model has already learned. For example, if you were to try to make a model work well with a language not in its training data, LoRa is not going to work very well. However, if you wanted to make the model give things in a dialogue like situation (as is the case of alpaca), it can work because the model has already seen dialogue before, so the LoRa makes it "focus" on creating dialogue.

You can get useful results with just LoRa, which is nice. If you want to try to experiment with architecture improvements or large scale finetunes / training from scratch, you are out of luck unless you have millions of dollars.

I'd say the biggest limitation of LoRa is that your model for the most part already has to "know" everything that you are trying to do. It's not a good solution to add more information into the model (e.g. training it on information after 2021 to make it more up to date) with lora. That has to be a full finetune which is a lot more expensive.

As for the cost, I honestly don't know because these companies don't like to make data like that public. We don't even know for sure what hardware GPT 3 was trained on, although it was likely V100s, and then A100s for GPT 3.5 and 4. I think people calculated the least they could have spent on training was around 4.5 million for GPT 3, and 1.6 million for llama. That doesn't even include all the work that went into building an absolutely massive dataset and paying employees to figure out how to do distributed training across tens of thousands of nodes with multiple GPUs each.

fiftyfourseventeen t1_je0z514 wrote on March 28, 2023 at 5:02 PM

Reply to comment by nomadiclizard in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

That's exactly what happened here lol, they only deduplicated by exact duplicate text so there was lots of similar data in both sets

fiftyfourseventeen t1_je0u1oj wrote on March 28, 2023 at 4:30 PM

Reply to comment by utopiah in [D] FOMO on the rapid pace of LLMs by 00001746

You can't compare a lora to training a model lol

fiftyfourseventeen t1_jdz6eu7 wrote on March 28, 2023 at 7:08 AM

Reply to comment by ---AI--- in [D] FOMO on the rapid pace of LLMs by 00001746

The only way you are training your own GPT 3 level model for 600 is by spending 300 bucks on a gun, 300 bucks renting a u haul and heisting a datacenter

Edit: maybe cheap out on the gun and truck, can't forget about electricity costs of your newly acquired H100s

fiftyfourseventeen t1_jdt29u3 wrote on March 26, 2023 at 11:38 PM

Reply to comment by liqui_date_me in [D] GPT4 and coding problems by enryu42

I've wasted too much time trying to do basic tasks with it as well. For example, I argued with it for many messages about something that was blatantly wrong, and it insisted it wasn't (that case it was trying to use order by similarity with an arg to sort by euclidian distance or cosine similarity, but it really didn't want to accept that cosine similarity isn't a distance metric and therefore has to be treated differently when sorting).

My most recent one was where I wasted an hour of time doing something that was literally just 1 line of code. I had videos of all different framerates, and I wanted to make them all 16fps while affecting length and speed as little as possible. It gave me a couple solutions that just straight up didn't work, and then I had to manually fix a ton of things with them, and then I finally had a scuffed and horrible solution. It wouldn't give me a better algorithm, so I tried to make one on my own, when I thought "I should Google if there's a simpler solution". From that Google search I learned "oh, there's literally just a .set_fps() method".

Anyways from using it I feel like it's helpful but not as much as people make it out to be. Honestly, GitHub copilot had been way more helpful because it can auto complete things that just take forever to write but are common, like command line args and descriptions, or pieces of repetitive code.

fiftyfourseventeen t1_jdo5rdg wrote on March 25, 2023 at 9:42 PM

Reply to comment by SkyeandJett in Levi's to Use AI-Generated Models to 'Increase Diversity' by SnoozeDoggyDog

!RemindMe 1 year "AI progress has plateaud for the the next 3 or 4 years until another breakthrough happens, as has been the case for the last 20 years. Currently 1 year into the plateau when I see this message"

fiftyfourseventeen t1_jdnul9a wrote on March 25, 2023 at 8:19 PM

Reply to comment by Verzingetorix in Levi's to Use AI-Generated Models to 'Increase Diversity' by SnoozeDoggyDog

Somebody finally said it lol. I think it's easy to see everything with starry eyes if you don't know all that much about robotics or how AI architectures actually work. Companies like Boston dynamics have been trying to solve robotics in the real world for years. Trying to make a humanoid like creature that can move around in an environment is EXTREMELY hard. And that's just on the robotics end, not the AI end.

The best AI right now are text gen and image gen. This is largely because of the amount of training data available for them. Trying to train an agent to interact with environment to preform a skilled trade? That's such an inconceivably hard task. Think about how much time Tesla has tried to make a self driving car, which is honestly really simple compared to a trade. There are maps that tell you the location of every building and road in the world, and there are a set of rules that everyone has to follow. Even then, it still has problems like running lights, failing to see pedestrians in front of it, hell even just looking at the screen you can see it bugging out trying to figure out if it's looking at a truck, car, or a bike.

Now think of that in trade terms. How are we going to have an AI purchase the hardware needed, go to the house, ask the owner what the problem is and where it is, diagnose the problem, and then fix it, all without screwing up and flooding the whole house. These are orders of magnitude more difficult problems for AI to solve that writing an essay, writing code, or creating an image. And we don't even have a lick of training data.

And then for anybody who's like "oh well it was also inconceivable for text and image gen", well I mean maybe for most people, but I think a lot of people (including myself) saw huge potential in them since years ago. I also develop image, video, and language models so it's not like I'm clueless about AI either.

fiftyfourseventeen t1_jdnsmo3 wrote on March 25, 2023 at 8:05 PM

Reply to comment by SkyeandJett in Levi's to Use AI-Generated Models to 'Increase Diversity' by SnoozeDoggyDog

This year? You guys are insane lol. Computer vision is absolutely trash compared to the state of language models. And no, feeding each frame into GPT 4 is not a good or viable option either.

fiftyfourseventeen t1_jdnqlqc wrote on March 25, 2023 at 7:50 PM

Reply to comment by nixed9 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

That's really cool, but I mean, it's published by Microsoft which is working with openAI, and it's a commerical closed source product. It's in their best interest to brag about it's capabilities as much as possible.

There are maybe sparks of AGI, but there are a lot of problems that are going to be very difficult to solve that people have been trying to solve for decades.

fiftyfourseventeen t1_jdnhbn0 wrote on March 25, 2023 at 6:43 PM

Reply to comment by Yardanico in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

OpenAI is also doing a lot of tricks behind the scenes, so it's not really fair to just type two things into both, because they are getting nowhere near the same prompt. Llama is promising but it just needs to be properly instruction tuned

fiftyfourseventeen t1_jdngwum wrote on March 25, 2023 at 6:40 PM

Reply to comment by wrossmorrow in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Eh.... Not really, that's training a low rank representation of the model, not actually making it smaller.

fiftyfourseventeen t1_jdlm1n7 wrote on March 25, 2023 at 8:35 AM

Reply to comment by rePAN6517 in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

Lmao it seems everyone used chatGPT for a grand total of 20 minutes and threw their hands up saying "this is the end!". I have always wondered how the public would react once this tech finally became good enough for the public to notice, can't say this was too far from what I envisioned. "What if it's conscious and we don't even know it!" Cmon give me a break