maskedpaki t1_j8by3r7 wrote on February 13, 2023 at 4:04 AM

#1,793,847

This is like 2 weeks old. If it really does surpass gpt3 with under a billion parameters then why isn't this on headlines.

[deleted] t1_j8byajw wrote on February 13, 2023 at 4:06 AM

#1,793,866

[deleted]

Sandbar101 t1_j8byoss wrote on February 13, 2023 at 4:09 AM

#1,793,892

Goddamn

turnip_burrito t1_j8byoth wrote on February 13, 2023 at 4:09 AM

#1,793,893

Surpasses humans in this science test, across the board (natural science, social science, language science, etc).

Wow.

And outperforms GPT 3.5 with about 0.4% the parameter amount.

Wonder how it does on other tests?

Would this run on consumer graphics cards, then? Seems like it's in the ballpark to run on a single 3090, but without knowing the total requirements, I can't say.

Edit: "Our experiments are run on 4 NVIDIA Tesla V100 32G GPU" - paper

Paper link: https://arxiv.org/pdf/2302.00923.pdf#page=7

Ok_Criticism_1414 OP t1_j8bywbq wrote on February 13, 2023 at 4:11 AM

#1,793,909

Replying to maskedpaki (#1,793,847)

because of ChatGPT hype ? Who knows. I think open AI already did it, the just dont showing to the puclic. Main thing i guess that Amazon made a different aproach integrating two modalities by prefintuning to be multimodal. You can read in the paper. + Looks like language + visual context gives a huge boost. But it already being done by Flamingo model so i gues the first is crucial.

turnip_burrito t1_j8bz5rv wrote on February 13, 2023 at 4:13 AM

#1,793,924

Replying to maskedpaki (#1,793,847)

10 days old, going by version 1 on arxiv.

d00m_sayer t1_j8bzbnm wrote on February 13, 2023 at 4:15 AM

#1,793,937

Replying to maskedpaki (#1,793,847)

Because people cannot use it yet 🙄

bladecg t1_j8bznlz wrote on February 13, 2023 at 4:17 AM

#1,793,961

Maybe their model is just overfitting a lot to the test data? That’s always a thing in ML

NTIASAAHMLGTTUD t1_j8c0cj3 wrote on February 13, 2023 at 4:23 AM

#1,794,004

In the interest of skepticism, can anyone pour any cold water on this or is it as good as it sounds?

maskedpaki t1_j8c0eso wrote on February 13, 2023 at 4:24 AM

#1,794,011

Replying to d00m_sayer (#1,793,937)

I've seen so many things like this that actually end up surpassing gpt3 on some narrow benchmark with more optimised prompting rather than just being a better model overall

I hope I'm wrong this time

turnip_burrito t1_j8c1932 wrote on February 13, 2023 at 4:31 AM

#1,794,067

Replying to NTIASAAHMLGTTUD (#1,794,004)

It's only tested on one benchmark, called ScienceQA. Maybe testing it on others would allow us to how well it really stacks up.

94746382926 t1_j8c1t4r wrote on February 13, 2023 at 4:36 AM

#1,794,110

Replying to maskedpaki (#1,794,011)

Yeah we need more benchmarks.

94746382926 t1_j8c1uju wrote on February 13, 2023 at 4:37 AM

#1,794,115

Replying to bladecg (#1,793,961)

Yeah I feel like we need more benchmarks

gay_manta_ray t1_j8c1uv8 wrote on February 13, 2023 at 4:37 AM

#1,794,116

Replying to maskedpaki (#1,794,011)

this benchmarks seems pretty comprehensive

el_chaquiste t1_j8c1z83 wrote on February 13, 2023 at 4:38 AM

#1,794,126

Replying to NTIASAAHMLGTTUD (#1,794,004)

If I understand well, seems the input set (a science exam with solved exercises and detailed responses) is smaller than GPT3.5's own, but it overperforms GPT3.5 and humans on solving problems similar to those from said exam by some percent, more if it has a multimodal training including visual data.

I honestly don't know if we should get overly excited over this or not, but it seems like it would allow the creation of smaller models focused on some scientific and technical domains, with better accuracy in their reponses than generalist LLMs.

blueSGL t1_j8c26i1 wrote on February 13, 2023 at 4:40 AM

#1,794,142

Replying to turnip_burrito (#1,793,893)

> Experimental Settings

> As the Multimodal-CoT task re- quires generating the reasoning chains and leveraging the vision features, we use the T5 encoder-decoder architec- ture (Raffel et al., 2020). Specifically, we adopt UnifiedQA (Khashabi et al., 2020) to initialize our models in the two stages because it achieves the best fine-tuning results in Lu et al. (2022a). To verify the generality of our approach across different LMs, we also employ FLAN-T5 (Chung et al., 2022) as the backbone in Section 6.3. As using im- age captions does not yield significant performance gains in Section 3.3, we did not use the captions. We fine-tune the models up to 20 epochs, with a learning rate of 5e-5. The maximum input sequence length is 512. The batch sizes for the base and large models are 16 and 8,respectively. Our experiments are run on 4 NVIDIA Tesla V100 32G GPUs.

So the GPUs were used in training, there is nothing to say what the system requirements will be for inference.

[deleted] t1_j8c34us wrote on February 13, 2023 at 4:48 AM

#1,794,194

Replying to el_chaquiste (#1,794,126)

[deleted]

phira t1_j8c3mqx wrote on February 13, 2023 at 4:53 AM

#1,794,225

Replying to blueSGL (#1,794,142)

Hrm. 512 limit on input might explain the performance vs parameters

Ishynethetruth t1_j8c6kn8 wrote on February 13, 2023 at 5:22 AM

#1,794,361

Replying to NTIASAAHMLGTTUD (#1,794,004)

Unless it’s released to the public don’t trust Amazon

davidolson22 t1_j8c7m62 wrote on February 13, 2023 at 5:33 AM

#1,794,407

Many more tasks including grammar

WithoutReason1729 t1_j8c97b1 wrote on February 13, 2023 at 5:50 AM

#1,794,486

Replying to turnip_burrito (#1,793,893)

GPT-2 XL is 1.5 billion parameters. Unless they added some very computationally expensive change to this new model that's unrelated to the parameter count, this could definitely run on consumer hardware. Very very cool!

SoylentRox t1_j8cb8c0 wrote on February 13, 2023 at 6:13 AM

#1,794,612

Obviously the next question is "what happens if you give it 1 Trillion parameters". (this would be 1355 times as many params.)

SoylentRox t1_j8cblun wrote on February 13, 2023 at 6:17 AM

#1,794,634

Replying to [deleted] (#1,794,194)

Theoretically it should query a large number of models, and have a "confidence" based on how likely each model's answer is to be correct. Then return the most confidence answer.

Red-HawkEye t1_j8cd8mp wrote on February 13, 2023 at 6:37 AM

#1,794,728

Replying to WithoutReason1729 (#1,794,486)

God damn, Amazon has entered the game.

Just when you think you had seen it all, an atomic bomb like this one gets announced.

This is equivalent to Black freiza villain coming back in Dragon ball and one shotting the main characters.

Amazon GPT one shots ChatGPT and Google LaMBDA out of nowhere.

duboispourlhiver t1_j8cpldd wrote on February 13, 2023 at 9:28 AM

#1,795,301

Replying to SoylentRox (#1,794,634)

Artificial expert panel

Yesyesnaaooo t1_j8cs7dl wrote on February 13, 2023 at 10:07 AM

#1,795,399

Feels like even if this isn't 'it' then the next stage is coming down the pipe really fucking soon.

If we simply consider the amount of human resources that went into reaching the moon, or breaking the atom both 'time sensitive' pursuits; and then weight that against the amount of human resource that's going into AI?

Well the race is on and there's more research going uni this than ever went into either of those projects.

It's going to take less than a decade for a winner to emerge.

grimorg80 t1_j8ctetu wrote on February 13, 2023 at 10:25 AM

#1,795,445

Replying to Red-HawkEye (#1,794,728)

I want to see it in action out in the open, though.

rixtil41 t1_j8ctrij wrote on February 13, 2023 at 10:30 AM

#1,795,459

Replying to blueSGL (#1,794,142)

I wonder how much more efficient these models can get ?

DarkCeldori t1_j8cz4sg wrote on February 13, 2023 at 11:43 AM

#1,795,686

Replying to WithoutReason1729 (#1,794,486)

While on the topic of consumer h/w, ryzen ai xdna seems promising, as itll be able to easily make use of main system memory which will soon be able to easily reach 256GB. That can fit very large models and inference is usually far less computationally intensive than training.

RahnuLe t1_j8cz9sc wrote on February 13, 2023 at 11:45 AM

#1,795,691

These passages really stuck out to me.

>We speculate that such a phenomenon of hallucination is due to a lack of necessary vision contexts for performing effective Multimodal-CoT. To inject vision information, a simple way is to transform the paired image into a caption (Lu et al., 2022a) and then append the caption in the input of both stages. However, as shown in Table 3, using captions only yields marginal performance gains (↑0.59%). Then, we explore an advanced technique by incorporating vision features into the language model. Concretely, we feed the paired image to the DETR model (Carion et al., 2020) to extract vision features. Then we fuse the vision features with the encoded language representations before feeding to the decoder (more details will be presented in Section 4). Interestingly, with vision features, the RougeL score of the rationale generation has boosted to 96.97% (QCM→R), which correspondingly contributes to better answer accuracy of 84.91% (QCMR→A).3 With those effective rationales, the phenomenon of hallucination is mitigated — 62.5% hallucination mistakes in Section 3.2 have been corrected (Figure 3(b)), as an example shown in Figure 2 (right part).4 The analysis so far compellingly shows that vision features are indeed beneficial for generating effective rationales and contributing to accurate answer inference.
>
>[...]
>
>Compared with existing UnifiedQA and GPT-3.5 methods that leverage image captions in the context to provide vision semantics, the results indicate that using image features is more effective.

It's clear that there's still a lot more to go in terms of representing the data in such a way that these networks can fully process them without factual errors, and this paper is a strong demonstration of the gains that can happen when you address this aspect specifically. Very promising stuff, and frankly, it's also kind of terrifying. Human obsolescence is frighteningly closer than I had already imagined...

gangstasadvocate t1_j8d43oc wrote on February 13, 2023 at 12:39 PM

#1,795,944

Replying to DarkCeldori (#1,795,686)

Apparently one of my professors well, her husband has a home grade computer system with 2 TB of RAM. I tried searching it up. It only seems like server type size but yeah

beezlebub33 t1_j8d62pw wrote on February 13, 2023 at 12:59 PM

#1,796,061

Replying to 94746382926 (#1,794,110)

Benchmarks are really hard and expensive. And they are not fun or exciting for the people involved; the groups that make them really deserve more credit.

Cryptizard t1_j8d7xv0 wrote on February 13, 2023 at 1:17 PM

#1,796,176

Keep in mind that the “human” line here is data they got from assigning the questions to randos on Mechanical Turk. It is not saying that their model is better than scientists. Still a really cool result, I just know how people here love to jump to insane conclusions.

Cryptizard t1_j8d86dt wrote on February 13, 2023 at 1:19 PM

#1,796,193

Replying to NTIASAAHMLGTTUD (#1,794,004)

The humans they tested on were random people on Mechanical Turk, so that data point is not very illuminating.

Black_RL t1_j8d913k wrote on February 13, 2023 at 1:27 PM

#1,796,239

Can it solve aging already…..?

No…..?

Ok then……

No_Ninja3309_NoNoYes t1_j8d9lii wrote on February 13, 2023 at 1:32 PM

#1,796,274

This is the principle of separation of concerns at work. Many focused capsules working together are stronger than one huge LLM. And it is easier to fine tune and prune individual capsules than one giant black box. It makes sense at inference time too. Eventually you could have a minimal model running locally with the sole purpose of figuring out which web services to contact for a given request.

GayHitIer t1_j8dcv5i wrote on February 13, 2023 at 2:00 PM

#1,796,475

Replying to Black_RL (#1,796,239)

Solving aging will probably be a slow progression from repairing your body to reversing aging.

Don't stress it, if Aubrey de grey is just on to something it will happen sooner than later.

Black_RL t1_j8dd5np wrote on February 13, 2023 at 2:02 PM

#1,796,493

Replying to GayHitIer (#1,796,475)

Hope so!

But some prefer to use this kind of tech to make more money instead of solving aging, I guess they’re counting on spending it in the afterlife…..

GayHitIer t1_j8de3p8 wrote on February 13, 2023 at 2:10 PM

#1,796,562

Replying to Black_RL (#1,796,493)

If we make a cure to aging it will probably get rolled out quickly like the corona vaccine.

It would be stupid to deny people to live longer.

Also economic wise it would save a lot of money for countries.

Black_RL t1_j8dedg7 wrote on February 13, 2023 at 2:12 PM

#1,796,585

Replying to GayHitIer (#1,796,562)

It would save plenty of countries, some are aging really fast.

Already is a huge problem.

GayHitIer t1_j8dgbim wrote on February 13, 2023 at 2:27 PM

#1,796,712

Replying to Black_RL (#1,796,585)

I agree, The biggest problem is the pro aging trance, people think death gives meaning to life.

Death needs to be put down for good, Death itself takes away meaning if anything.

earthsworld t1_j8dghso wrote on February 13, 2023 at 2:28 PM

#1,796,728

Replying to Black_RL (#1,796,239)

maybe /r/Futurology is a better sub for you?

Black_RL t1_j8dhwp0 wrote on February 13, 2023 at 2:39 PM

#1,796,825

Replying to GayHitIer (#1,796,712)

100% agreed! Death is what makes everything meaningless!

People think that after they die, someone is going to judge them regarding their achievements?

Death is emptiness, death is the vacuum, there’s nothing before nor after, it’s the complete absence of consciousness, if there’s no consciousness, there’s no meaning.

GayHitIer t1_j8dicpl wrote on February 13, 2023 at 2:42 PM

#1,796,859

Replying to Black_RL (#1,796,825)

True, while death doesn't scare me.

We need to put death in his place for good.

Consciousness is the most valuable thing in the universe.

If there's nobody to observe the universe it might as well not exist in the first place.

Black_RL t1_j8dimnt wrote on February 13, 2023 at 2:44 PM

#1,796,880

Replying to GayHitIer (#1,796,859)

It doesn’t scare me too, if I’m dead I don’t exist, therefore I can’t care.

Aging on the other hand…..

Agreed! Consciousness is what gives meaning to stuff! Meaning is a construct of conscience.

FusionRocketsPlease t1_j8diuz2 wrote on February 13, 2023 at 2:46 PM

#1,796,902

Replying to 94746382926 (#1,794,115)

This type of paper should not be published before passing all possible tests that can refute the claim in the title...

GayHitIer t1_j8djtly wrote on February 13, 2023 at 2:53 PM

#1,796,989

Replying to earthsworld (#1,796,728)

I don't get linking r/Futurology?

Why can't we just accept people have different outlooks on life?

Sure some of them are doomers, but most of them are just skeptics, which makes sense.

Singularity sometimes sounds like some techno cult, at least let us discuss with them.

Also downvoting BlackRl for no reason and not giving any reason or discussion is dumb.

DarkCeldori t1_j8dk62j wrote on February 13, 2023 at 2:55 PM

#1,797,014

Replying to gangstasadvocate (#1,795,944)

I think some threadripper pro workstations can reach up to 2TB of ram. Will be very good once treadrippers come with ryzen xdna ai built in as that can directly use main memory for ai tasks.

user1342 t1_j8ds86t wrote on February 13, 2023 at 3:52 PM

#1,797,532

I wonder if the leap in amazon's AI was because they used gtp or something similar to help them make their new AI more efficient?

I always thought that was how we would see real acceleration, when the AI's start designing the next generation of themselves.

Denpol88 t1_j8dwlml wrote on February 13, 2023 at 4:21 PM

#1,797,820

This is huge.

RabidHexley t1_j8dzxsv wrote on February 13, 2023 at 4:43 PM

#1,798,017

Replying to SoylentRox (#1,794,634)

I Am Legion

NapkinsOnMyAnkle t1_j8e4p9m wrote on February 13, 2023 at 5:19 PM

#1,798,317

Replying to turnip_burrito (#1,793,893)

I've trained CNNs of up to 200m parameters on my laptops 3070 without any issues. I think it's only around 5gb of available VRAM.

This is a big concern of mine; AGI actually requires an insurmountable amount of VRAM to train in realistic timeframes thereby being essentially impossible. I mean, we could calculate all of these models by hand to train and then use to make predictions but it would take forever, like literally forever!

FusionRocketsPlease t1_j8e71cc wrote on February 13, 2023 at 5:35 PM

#1,798,442

Why are there studies on GPT-3.5 in 2020 if it appeared in 2022?

SmoothPlastic9 t1_j8e9qrh wrote on February 13, 2023 at 5:53 PM

#1,798,603

Let me used it first then instead of some benchmark test

Ok_Criticism_1414 OP t1_j8er8eu wrote on February 13, 2023 at 7:47 PM

#1,799,708

Replying to FusionRocketsPlease (#1,798,442)

GPT -3.5 is fintuned version of GPT-3 that appeared in 2020. ChatGPT is the same technology and foundation as GPT 3 but better optimized.

Borrowedshorts t1_j8erk8h wrote on February 13, 2023 at 7:49 PM

#1,799,728

Replying to NTIASAAHMLGTTUD (#1,794,004)

It's as good as it sounds, and you can't really fake performance on a dataset such as this. Multimodal models will change the game. I don't think multimodal models by themselves are the end game, but they appear to be poised to takeover state of the art performance for the foreseeable future.

NarrowTea t1_j8f9l82 wrote on February 13, 2023 at 9:45 PM

#1,800,967

They just keep making them more efficient.

ReadSeparate t1_j8fb4cr wrote on February 13, 2023 at 9:55 PM

#1,801,039

Replying to [deleted] (#1,794,194)

One can easily imagine a generalist LLM outputting an action token which represents prompting the specialized LLM, which then gets routed to the specialized LLM, then the response is formatted and put into context by the generalist.

olivermyk t1_j8fnvwa wrote on February 13, 2023 at 11:23 PM

#1,801,865

the pdf isn’t accessible from the original link anymore. did anyone manage to download it? could you please re share?

thelonghauls t1_j8g04tz wrote on February 14, 2023 at 12:53 AM

#1,802,650

Speling too

maskedpaki t1_j8g2v6v wrote on February 14, 2023 at 1:14 AM

#1,802,809

Replying to gay_manta_ray (#1,794,116)

no actually seems like a pretty narrow science benchmark

if you told me the MMLU 0 shot was higher than 175 billion gpt 3.5 with under a billion parameters then id be absolutely shocked

[deleted] t1_j8g761n wrote on February 14, 2023 at 1:47 AM

#1,803,075

Replying to ReadSeparate (#1,801,039)

[deleted]

urinal_deuce t1_j8hrp5o wrote on February 14, 2023 at 12:14 PM

#1,806,368

Does it pass the sience of copy paste?

urinal_deuce t1_j8hsln7 wrote on February 14, 2023 at 12:24 PM

#1,806,421

Replying to SoylentRox (#1,794,612)

More is always better.

Tiamatium t1_j8i6ien wrote on February 14, 2023 at 2:25 PM

#1,807,326

Replying to gangstasadvocate (#1,795,944)

Yeah, 2TB ram is doable with server/workstation hardware. Think Threadripper or Xeon for CPU.

PurpedSavage t1_j8ir1we wrote on February 14, 2023 at 4:45 PM

#1,808,684

Was this study tied to funding in anyway towards Amazon?

Akimbo333 t1_j8p643a wrote on February 15, 2023 at 11:12 PM

#1,823,887

The huge performance boost of a mere 738M model, appears to be due to it being a multimodal model. Which can use not only text but image and other means as well.

Comments

> Experimental Settings