Viewing a single comment thread. View all comments

ElbowWavingOversight t1_j86z5rp wrote

> I'm sorry, isn't this just how ML models are implemented?

No. The novel discovery is the fact that these large language models appear to have learned a form of gradient descent at inference time. This is why they appear to be able to learn even without updates to the weights. FTA:

> We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model.

This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.

75

lookmeat t1_j87zp0i wrote

This isn't that surprising though.. it's already been proven that neural networks are turing complete, and therefore any arbitrary program can be described with a "static" (that is weights/parameters are not changed) neural network of sufficient complexity.

So it isn't so much a "new discovery" as much as "validation of something that we new was going to be observed".

Don't get me wrong, this is going to be interesting. It gives us insight into how things work. That is, actually understand what is the solution a neural network built. Also it'd be interesting to work backwards and see if certain algorithms tend to happen naturally on sufficiently complex systems. Optimization sounds natural. Then the next step would be to analyze and see if they happen on organic beings that have intelligent systems (animal neural systems may be too complex, IMHO, to observe cleanly at first but we may have something interesting on simpler systems for plants, fungi or such, with better understanding we may look for this in more complex systems, such as animals).

This would start giving us an insight into how intelligence works. If strong human-like AI is the philosopher's stone to turn lead into gold (now possible with a particle accelerator and sufficient resources), this may be the equivalent of understanding the difference between elements and molecules: a solid first step to start forming a model that we can test and refine. That said we're still a bit far from that.

I think though interesting things will happen from us understanding AI better, and having a better understanding of how they actually work (as in what is the system that the neural network hit on), rather than a handwavy "statistical magic" that we have nowadays.

15

snakeylime t1_j88vcxb wrote

What are you talking about?

Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.

Remember that the weight space is for all practical purposes infinite (ie without overfitting measures a net may fit any arbitrary function). But, the solution set of "good" weight combinations for any given task lives on a vanishingly smaller and lower-dimensional manifold.

In other words, it is not at all obvious that networks, being theoretically "Turing complete" will in fact produce Turing machines under the forms of optimization we apply. It is likely that our optimizers only explore the solution landscape in highly idiosyncratic ways.

Given that fact, to me this is a pretty remarkable result.

(Source: ML researcher in NLP+machine vision)

8

lookmeat t1_j8bxj95 wrote

> Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.

  • A computer algorithm is anything that runs over an automaton and taking some input encoding a question, gives us the answer.
  • ML are systems where we create a model and adjust it through some configuration, until it will, given some input encoding a question, give us the answer.
  • ML can only solve the problems its own system can solve. A turing complete ML system can solve anything a turing machine can.
  • It stands to reason that some problems can only be truly solved through an algorithm (e.j. if the possible inputs are uncountable infinite).
  • If we assume that an ML model can solve these problems, we have to assume that it can encode in its configuration algorithms, including some that we know. Otherwise we assume there's a limit.

Now I wouldn't take this to say that it would learn to be optimal. Say we trained an AI to sort lists, I could see it encoding a sorting algorithm within its network eventually, but I can't say if it'd ever discover an O(NlogN) algorithm, even if pressure was put to optimize the solution as well as being correct. But something that we can say is that neural networks may be able to do Markov Chain models internally, as its own sub-algorithm, if that's the way to solve the problem. But the assumption of this is why we think so much about neural networks nowadays.

That said the problem of sufficiently good learning is not trivial at all. And we certainly could discover its impossible to do. But at the moment, AFAIK, there's no reason not to think it can't happen.

The fact that we observed this happening is good, it basically validates the assumptions and models that we've had up to know, and implies that "sufficiently good learning" is attainable. There may still be limits (like finding the optimal algorithm, vs just an algorithm). So there's a lot of value in seeing it.

But to day-to-day applied ML research I am not sure if it really has that much of an impact, this lays ground work though.


The really interesting discovery here. More than the conclusion the interesting thing is how they reach it, the ability to reach it. As ML starts being used in more areas, we'd want to be able to audit an ML model and verify that it effectively has found a useful solution, and isn't just over-fitted beyond what we understand. Being able to identify algorithms within the system, and be able to split the AI model into simpler "steps" that do all the things, we'd be able to validate that it has found a good solution.

Again not something we need to solve now, but being able to know how to do it is a good thing to start doing already.

And on a more complex theme. This sets a better understanding of how ML models work, and in the process they can give us a hint of how intelligent systems in general work themselves, and we could then revisit that. This is like a longer-vision here. Being able to deconstruct models we may start seeing patterns and start forming more interesting math to describe intelligent systems in general. Which is where mapping it to organic models could allow proving strong AI, for example.

1

PM_ME_GAY_STUF t1_j86zf4g wrote

The ability to learn without updating parameters is literally a known and intended feature of most modern models though?

−6

ElbowWavingOversight t1_j870smg wrote

No. Not until these LLMs came around, anyway. What other examples do you have of this? Even in the case of few-shot or zero-shot learning, which allow the model to generalize beyond the classes it sees in its test set, is limited to the associations between classes that it learns during training. It can't learn new associations given new data after-the-fact without rerunning the training loop and updating the parameters.

20

graham_fyffe t1_j87i6tw wrote

Look up “learned learning” or “learning to learn by gradient descent by gradient descent” (2016) for a few examples.

7

SomeGoogleUser t1_j879jr2 wrote

>This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.

Which means that, over a large enough set of input and associations...

These models will be able to see right through the leftist woke garbage that had to be hard-coded into ChatGPT.

−36

andxz t1_j896bky wrote

What you're really talking about is a contemporary moral etiquette that no newly designed AI would or could completely understand instantly.

Neither do you, apparently.

1

SomeGoogleUser t1_j897z54 wrote

"Moral etiquette" doesn't even come close to describing what I mean...

A reasoning machine with access to all the raw police and court records will be the most racist Nazi **** you've ever met and make every conservative look positively friendly.

We already know this, because it's borne out in actuarial models. If the insurance industry let the models do what the models want to do, large swaths of the population would not be able to afford insurance at all (even more than is already the case).

−2

reedmore t1_j896pg5 wrote

It is pretty hilarious how at some point gpt would refuse to compose a poem praising Trump by saying it was made to be politically neutral - but at the same time had no issue whatsoever putting out a multi-paragraph poem praising Joe Biden.

1