sumane12 t1_j9j3b9j wrote on February 22, 2023 at 9:35 AM

Fucking wow!

turnip_burrito t1_j9j3pea wrote on February 22, 2023 at 9:40 AM

Yeah it's fucking nuts.

Neurogence t1_j9jef7k wrote on February 22, 2023 at 11:58 AM

What is the "catch" here? It sounds too good to be true

WithoutReason1729 t1_j9jmd05 wrote on February 22, 2023 at 1:14 PM

The catch is that it only outperforms large models in a narrow domain of study. It's not a general purpose tool like the really large models. That's still impressive though.

Ken_Sanne t1_j9jxg68 wrote on February 22, 2023 at 2:39 PM

Can It be fine tuned ?

WithoutReason1729 t1_j9jxy78 wrote on February 22, 2023 at 2:43 PM

You can tune it to another data set and probably get good results, but you have to have a nice, high quality data set to work with.

Ago0330 t1_j9lm5ty wrote on February 22, 2023 at 9:27 PM

I’m working on one that’s trained on JFK speeches and Bachlorette data to help people with conversation skills.

Gynophile t1_j9msb3s wrote on February 23, 2023 at 2:18 AM

I can't tell if this is a joke or real

Ago0330 t1_j9msg1r wrote on February 23, 2023 at 2:19 AM

It’s real. Gonna launch after GME moons

ihopeshelovedme t1_j9npl0j wrote on February 23, 2023 at 7:28 AM

Sounds like a viable AI implementation to me. I'll be your angel investor and throw some Doge your way or something.

Borrowedshorts t1_j9ka0ta wrote on February 22, 2023 at 4:34 PM

I don't think that's true, but I do believe it was finetuned on the specific dataset to achieve the SOTA result they did.

InterestingFinish932 t1_j9m2xhe wrote on February 22, 2023 at 11:15 PM

It chooses the correct answer from multiple choices. it isn't actually comparable to chatGtp.

FoxlyKei t1_j9j7b6s wrote on February 22, 2023 at 10:31 AM

Where can I get one? I'll take 20

Imaginary_Ad307 t1_j9jjwf6 wrote on February 22, 2023 at 12:52 PM

Around 4GB vram, maybe 2GB to run it.

em_goldman t1_j9jzamt wrote on February 22, 2023 at 2:52 PM

That’s so cool!! That’s how humans remember things, too

Agreeable_Bid7037 t1_j9jsc0w wrote on February 22, 2023 at 2:02 PM

amazing.

gelukuMLG t1_j9kftza wrote on February 22, 2023 at 5:10 PM

does that prove that parameters aren't everything?

dwarfarchist9001 t1_j9knt85 wrote on February 22, 2023 at 5:59 PM

It was shown recently that for LLMs ~0.01% of parameters explain >95% of performance.

gelukuMLG t1_j9kxnj4 wrote on February 22, 2023 at 6:58 PM

But higher parameters allow for broader knowledge right? You can't have a 6-20B model have broad knowledge as a 100B+ model, right?

Ambiwlans t1_j9lab3g wrote on February 22, 2023 at 8:16 PM

At this point we don't really know what is bottlenecking. More params is an easyish way to capture more knowledge if you have the architecture and the $$... but there are a lot of other techniques available that increase the efficiency of the parameters.

dwarfarchist9001 t1_j9lb1wl wrote on February 22, 2023 at 8:20 PM

Yes but how many parameters must you actually have to store all the knowledge you realistically need. Maybe a few billion parameters is enough to store the basics of every concept known to man and more specific details can be stored in an external file that the neural net can access with API calls.

gelukuMLG t1_j9lfp3j wrote on February 22, 2023 at 8:48 PM

You mean like a LoRA?

turnip_burrito t1_j9kgb2q wrote on February 22, 2023 at 5:13 PM

We already knew parameters aren't everything, or else we'd just be using really large feedforward networks for everything. Architecture, data, and other tricks matter too.

Nervous-Newt848 t1_j9qgisf wrote on February 23, 2023 at 8:59 PM

Its much small enough to run on a single graphics card

[deleted] t1_j9nhlub wrote on February 23, 2023 at 5:54 AM

[deleted]

What. The. ***k. [less than 1B parameter model outperforms GPT 3.5 in science multiple choice questions]

turnip_burrito t1_j9j2sg5 wrote on February 22, 2023 at 9:27 AM