Viewing a single comment thread. View all comments

PM_ME_GAY_STUF t1_j85vf9d wrote

I'm sorry, isn't this just how ML models are implemented?

I'm sure there's real work being done here, but this article reads like the researcher started giving the reporter a high level overview of how their model works and the reporter immediately yelled "That's an amazing discovery!" and ran out of the room before they even started describing their research

408

PEVEI t1_j85w9i1 wrote

In this case "mind-bending" means the 'science communicator's' mind was bent, a pitifully low bar. This is Vice after all, their headlines are even more embarrassing than their content.

93

DefreShalloodner t1_j86zr39 wrote

On the OTW hand I agree with you, but on the OTOH hand I support the rehashing/reframing of scientific or technical ideas in the interest of bending the public's minds.

Similarly, I roll my eyes when concepts from my abstruse specialty get butchered in movies or TV, but at the same time I appreciate the exposure they are giving to those ideas (ersatz or not).

[Edit: fixed acronyms]

13

GlowGreen1835 t1_j877pqj wrote

On the on the other hand hand.

11

SpecificAstronaut69 t1_j878a5m wrote

It's funny because these guys who lament science communication are the same ones who'll call anyone else using terms from non-STEM fields "gatekeepers" at the drop of a hat in my experience...

3

ElbowWavingOversight t1_j86z5rp wrote

> I'm sorry, isn't this just how ML models are implemented?

No. The novel discovery is the fact that these large language models appear to have learned a form of gradient descent at inference time. This is why they appear to be able to learn even without updates to the weights. FTA:

> We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model.

This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.

75

lookmeat t1_j87zp0i wrote

This isn't that surprising though.. it's already been proven that neural networks are turing complete, and therefore any arbitrary program can be described with a "static" (that is weights/parameters are not changed) neural network of sufficient complexity.

So it isn't so much a "new discovery" as much as "validation of something that we new was going to be observed".

Don't get me wrong, this is going to be interesting. It gives us insight into how things work. That is, actually understand what is the solution a neural network built. Also it'd be interesting to work backwards and see if certain algorithms tend to happen naturally on sufficiently complex systems. Optimization sounds natural. Then the next step would be to analyze and see if they happen on organic beings that have intelligent systems (animal neural systems may be too complex, IMHO, to observe cleanly at first but we may have something interesting on simpler systems for plants, fungi or such, with better understanding we may look for this in more complex systems, such as animals).

This would start giving us an insight into how intelligence works. If strong human-like AI is the philosopher's stone to turn lead into gold (now possible with a particle accelerator and sufficient resources), this may be the equivalent of understanding the difference between elements and molecules: a solid first step to start forming a model that we can test and refine. That said we're still a bit far from that.

I think though interesting things will happen from us understanding AI better, and having a better understanding of how they actually work (as in what is the system that the neural network hit on), rather than a handwavy "statistical magic" that we have nowadays.

15

snakeylime t1_j88vcxb wrote

What are you talking about?

Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.

Remember that the weight space is for all practical purposes infinite (ie without overfitting measures a net may fit any arbitrary function). But, the solution set of "good" weight combinations for any given task lives on a vanishingly smaller and lower-dimensional manifold.

In other words, it is not at all obvious that networks, being theoretically "Turing complete" will in fact produce Turing machines under the forms of optimization we apply. It is likely that our optimizers only explore the solution landscape in highly idiosyncratic ways.

Given that fact, to me this is a pretty remarkable result.

(Source: ML researcher in NLP+machine vision)

8

lookmeat t1_j8bxj95 wrote

> Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.

  • A computer algorithm is anything that runs over an automaton and taking some input encoding a question, gives us the answer.
  • ML are systems where we create a model and adjust it through some configuration, until it will, given some input encoding a question, give us the answer.
  • ML can only solve the problems its own system can solve. A turing complete ML system can solve anything a turing machine can.
  • It stands to reason that some problems can only be truly solved through an algorithm (e.j. if the possible inputs are uncountable infinite).
  • If we assume that an ML model can solve these problems, we have to assume that it can encode in its configuration algorithms, including some that we know. Otherwise we assume there's a limit.

Now I wouldn't take this to say that it would learn to be optimal. Say we trained an AI to sort lists, I could see it encoding a sorting algorithm within its network eventually, but I can't say if it'd ever discover an O(NlogN) algorithm, even if pressure was put to optimize the solution as well as being correct. But something that we can say is that neural networks may be able to do Markov Chain models internally, as its own sub-algorithm, if that's the way to solve the problem. But the assumption of this is why we think so much about neural networks nowadays.

That said the problem of sufficiently good learning is not trivial at all. And we certainly could discover its impossible to do. But at the moment, AFAIK, there's no reason not to think it can't happen.

The fact that we observed this happening is good, it basically validates the assumptions and models that we've had up to know, and implies that "sufficiently good learning" is attainable. There may still be limits (like finding the optimal algorithm, vs just an algorithm). So there's a lot of value in seeing it.

But to day-to-day applied ML research I am not sure if it really has that much of an impact, this lays ground work though.


The really interesting discovery here. More than the conclusion the interesting thing is how they reach it, the ability to reach it. As ML starts being used in more areas, we'd want to be able to audit an ML model and verify that it effectively has found a useful solution, and isn't just over-fitted beyond what we understand. Being able to identify algorithms within the system, and be able to split the AI model into simpler "steps" that do all the things, we'd be able to validate that it has found a good solution.

Again not something we need to solve now, but being able to know how to do it is a good thing to start doing already.

And on a more complex theme. This sets a better understanding of how ML models work, and in the process they can give us a hint of how intelligent systems in general work themselves, and we could then revisit that. This is like a longer-vision here. Being able to deconstruct models we may start seeing patterns and start forming more interesting math to describe intelligent systems in general. Which is where mapping it to organic models could allow proving strong AI, for example.

1

PM_ME_GAY_STUF t1_j86zf4g wrote

The ability to learn without updating parameters is literally a known and intended feature of most modern models though?

−6

ElbowWavingOversight t1_j870smg wrote

No. Not until these LLMs came around, anyway. What other examples do you have of this? Even in the case of few-shot or zero-shot learning, which allow the model to generalize beyond the classes it sees in its test set, is limited to the associations between classes that it learns during training. It can't learn new associations given new data after-the-fact without rerunning the training loop and updating the parameters.

20

graham_fyffe t1_j87i6tw wrote

Look up “learned learning” or “learning to learn by gradient descent by gradient descent” (2016) for a few examples.

7

SomeGoogleUser t1_j879jr2 wrote

>This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.

Which means that, over a large enough set of input and associations...

These models will be able to see right through the leftist woke garbage that had to be hard-coded into ChatGPT.

−36

andxz t1_j896bky wrote

What you're really talking about is a contemporary moral etiquette that no newly designed AI would or could completely understand instantly.

Neither do you, apparently.

1

SomeGoogleUser t1_j897z54 wrote

"Moral etiquette" doesn't even come close to describing what I mean...

A reasoning machine with access to all the raw police and court records will be the most racist Nazi **** you've ever met and make every conservative look positively friendly.

We already know this, because it's borne out in actuarial models. If the insurance industry let the models do what the models want to do, large swaths of the population would not be able to afford insurance at all (even more than is already the case).

−2

reedmore t1_j896pg5 wrote

It is pretty hilarious how at some point gpt would refuse to compose a poem praising Trump by saying it was made to be politically neutral - but at the same time had no issue whatsoever putting out a multi-paragraph poem praising Joe Biden.

1

Jorycle t1_j86dx1u wrote

Yeah I work in ML and I don't get what the novel discovery is here based on the article. This all just sounds like... what we already know. Like this line:

>"We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model."

That's so routine it's not even interesting.

I'm guessing the actual study goes into what was found, I'll have to read it when I have time.

59

MrChurro3164 t1_j87j8s7 wrote

Is this something we already know? I’m by no means an AI researcher but the model learning at run time without updating weights seems pretty novel no? What other ‘routine’ models do this?

3

SignificanceAlone203 t1_j87o8uo wrote

The weights that the AI updates and the "parameters we apply" are quite different. Weights are most definitely updated at run time during training. The fact that it learns without the researcher manually changing parameters is... kind of the whole point of AI.

8

MrChurro3164 t1_j87pn2y wrote

I think terms are being confused and it’s written poorly. From what I gather, the weights are not being updated, and this is not during training. This is someone chatting with the model and it learns new things “on the fly”.

From another article: > For instance, someone could feed the model several example sentences and their sentiments (positive or negative), then prompt it with a new sentence, and the model can give the correct sentiment. Typically, a machine-learning model like GPT-3 would need to be retrained with new data for this new task. During this training process, the model updates its parameters as it processes new information to learn the task. But with in-context learning, the model’s parameters aren’t updated, so it seems like the model learns a new task without learning anything at all.

5

jeffyoulose t1_j88xt1n wrote

How is it learning if no weights change? It's best simulating another training just for the session of input given at inference time.

1

professorDissociate t1_j89xizq wrote

Ah, so we’ve found d the novel discovery by the sound of this confusion then… yes?

5

try_cannibalism t1_j87an6i wrote

The internet needs to come up with a way to penalize this shit. Like a browser extension or something that hides any content with misleading or sensationalizing headlines

4

Law_Student t1_j89dv9a wrote

We could start by downvoting this stuff whenever we see it. It's mad that this has 600 upvotes and the top comments are about how it's misleading.

1

zephyrprime t1_j86wb5a wrote

That's pretty much every science and technology article written for laymen media.

1

Many_Caterpillar2597 t1_j87d6o9 wrote

can we start publicizing a list of these 4th estate dickwads who will do anything for clout-to-click money?

1

thepastyprince t1_j86q68v wrote

Off topic but does your name attract alot of dick pics?

−1

PM_ME_GAY_STUF t1_j86y2yr wrote

Only one or two so far actually, it's been fairly disappointing. A lot of people wanting to chat or RP though which I don't do

1

skolioban t1_j86lbkr wrote

I'm a pedestrian in AIs but here I thought it's generally understood that the AI that creates realistic human faces from composites does its thing by having another AI check whether the composite was good enough to be published? So it has always been about AIs working with each other?

−2

__ingeniare__ t1_j8722ii wrote

You're talking about generative adversarial networks (GANs), which is a type of architecture from many years ago. More recent image generators tend to be based on diffusion, and text generators like in the article are transformer based.

2