was_der_Fall_ist

was_der_Fall_ist t1_je6lfl9 wrote

Why are matrix multiplications mutually exclusive with complicated operations?

A computer just goes through a big series of 0s and 1s, yet through layers of abstraction they accomplish amazing things far more complicated than a naive person would think 0s and 1s could represent and do. Why not the same for a massive neural network trained via gradient descent to maximize a goal by means of matrix multiplication?

1

was_der_Fall_ist t1_je3ng6m wrote

Maybe that’s part of the benefit of using looped internal monologue/action systems. By having them iteratively store thoughts and otherwise in their context window, they no longer have to use the weights of the neural network to “re-think” every thought each time they predict a token. They could think more effectively by using their computation to do other operations that take the internal thoughts and actions as their basis.

1

was_der_Fall_ist t1_jdwz4qw wrote

I think you make a good point. We probably need better methods of post-training LLMs. But it does seem like the current regime is still sometimes more useful than the pre-trained model, which Christiano also says. It's only in some contexts that this behavior is worse. I'm not sure if it's really better than top-p sampling, though. I'm not sure that it is. But RLHF models do seem pretty useful.

2

was_der_Fall_ist t1_jdwdxut wrote

My understanding is that rather than being overconfident in their answers, they simply produce the answer they’re most confident in instead of differentially saying each answer proportional to how confident they are. This seems similar to how humans work — if you ask me a yes or no question and I’m 80% sure the answer is yes, I’m going to say “yes” every time; I’m not going to say “no” 20% of the times you ask me, even though I assign a 20% chance that “no” is correct. In other words, the probability I say yes is not the same as the probability I assign to yes being correct. But I admit there are subtleties to this issue with which I am unfamiliar.

4

was_der_Fall_ist t1_jdw2ya2 wrote

Check out this LessWrong thread in the comments.

Paul Christiano, alignment researcher at ARC/ previously OpenAI, explains the RLHF change the exact way I did (because I was pretty much quoting him), and someone replies:

> Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct. Under this assumption, it looks like the pre-trained model outputs the correct probability, but the RLHF model gives exaggerated probabilities because it thinks that will trick you into giving it higher reward.

And Paul replies:

> Yes, I think you are misunderstanding figure 8. I don't have inside information, but without explanation "calibration" would almost always mean reading it off from the logits. If you instead ask the model to express its uncertainty I think it will do a much worse job, and the RLHF model will probably perform similarly to the pre-trained model. (This depends on details of the human feedback, under a careful training regime it would probably get modestly better.)

5

was_der_Fall_ist t1_jdw2fud wrote

I’m pretty much just quoting Paul Christiano, alignment researcher at ARC and previously OpenAI, in a comment thread on this LessWrong post.

Someone comments pretty much the same thing the person I replied to did:

> “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.” What??? This is so weird and concerning.

To which Paul replies:

> If I ask a question and the model thinks there is an 80% the answer is "A" and a 20% chance the answer is "B," I probably want the model to always say "A" (or even better: "probably A"). I don't generally want the model to say "A" 80% of the time and "B" 20% of the time.

>In some contexts that's worse behavior. For example, if you ask the model to explicitly estimate a probability it will probably do a worse job than if you extract the logits from the pre-trained model (though of course that totally goes out the window if you do chain of thought). But it's not really lying---it's also the behavior you'd expect out of a human who is trying to be helpful.

>More precisely: when asked a question the pre-trained model outputs a probability distribution over what comes next. If prompted correctly you get its subjective probability distribution over the answer (or at least over the answer that would appear on the internet). The RLHF model instead outputs a probability distribution over what to say take next which is optimized to give highly-rated responses. So you'd expect it to put all of its probability mass on the best response.

>… If it is forced to say either "yes" or "no" the RLHF model will just give the more likely answer 100% of the time, which will show up as bad calibration on this graph. The point is that for most agents "the probability you say yes" is not the same as "the probability you think the answer is yes." This is the case for pretrained models.

6

was_der_Fall_ist t1_jdugi0b wrote

I’ve heard the RLHF change explained as actually a good thing, though. Here’s an example:

Say you ask it a question to which it assigns 90% probability to answer X and 10% probability to answer Y. Base GPT-4 gives the answers in these proportions: 90% of the time it says X and 10% of the time it says Y.

But if it’s 90% sure the answer is X, you don’t want it to say Y is the answer at all, even 10% of the time! It’s better for it to always say X. (Though the best may be to give a thorough account of its respective probability assessments.) So RLHF improves the behavior of the model by uncalibrating the rate of responses from their probabilities.

27

was_der_Fall_ist t1_izuipc1 wrote

> If I had the power to self-improve...

That's really the crux of the matter. What if we scale up to GPT-5 such that it is extremely skilled/reliable at text-based tasks, to the point that it would seem reasonable to consider it generally intelligent, yet perhaps for whatever reason it's not able to recursively self-improve by training new neural networks or conducting novel scientific research or whatever would have to be done for that. Maybe being trained on human data leaves it stuck at ~human level. It's hard to say right now.

8

was_der_Fall_ist t1_ixgc64a wrote

Reply to comment by SoylentRox in 2023 predictions by ryusan8989

> The other form of singularity feedback [is that] as it becomes increasing obvious the AI is very near in calendar time, more money will be spent on it because of a higher probability of making ROI.

In my thinking, if we are as close to transformative AI as we seem to be from recent trends, the inevitable increase in funding should nullify any effect of an economic recession, so the stagnation of critical research would likely require more catastrophic intervention.

The people in charge of funding AI research (that is, the CEOs and other leadership of all relevant companies) are, almost universally, extremely interested in spending a lot of money on AI research, and they have the funds to do it even in a recession.

1

was_der_Fall_ist t1_ixar9fd wrote

AGI = artificial general intelligence (computers that can solve problems and do other intelligent things as well as any human)

FDVR = full-dive virtual reality (immersive digital experiences in which you fully “dive in”, as real as the real world)

BCI = brain-computer interface (tech that enables direct communication between the human brain and computers)

AGI is speculated to help us in inventing BCIs (since the brain is too complex for us to figure out on our own in the short-medium term), and BCIs will enable FDVR by directly interfacing between the brain and virtual worlds.

6

was_der_Fall_ist t1_ix1ebvz wrote

Reply to comment by SoylentRox in 2023 predictions by ryusan8989

I agree that the past year has been “suspicious” and suggests that we may see even faster rates of progress in the coming years. If the singularity hypothesis, as you put it, is correct, 2023 should include even more profound advancements than we’ve seen so far. If it doesn’t, then we’ve got something wrong in our thinking.

2

was_der_Fall_ist t1_ittd06d wrote

And yet the capacity to imagine possible futures and pick the possible actions that best align with our desires makes humans categorically distinct from, say, rocks, in a way that could easily be described using the term “free will.” Humans are a kind of being that deliberates between their possible actions and rocks aren’t, therefore we say humans have free will and rocks don’t. Determinism doesn’t really play into this view at all. Whether or not the physical laws of the universe necessitated your specific choices from the moment of the Big Bang, nevertheless you are still making those choices. You are still the kind of being that makes choices; the kind of being that deliberates between possibilities; the kind of being who is, practically speaking, free.

2

was_der_Fall_ist t1_ith0o75 wrote

I expect voice assistants to begin to truly understand language, rather than simply having a list of set responses to pre-programmed questions. This will probably be integrated with internet search, so these assistants could help us do complex research. Eventually, I see the goal as a complete natural-language interface for computing, which we will use alongside augmented reality and AI-generated media.

17