Hyper1on

Hyper1on t1_j9y3vz1 wrote

I mean, I don't see how you get a plausible explanation of BingGPT from underfitting either. As you say, models are underfit on some types of data, but I think the key here is the finetuning procedure, either normal supervised, or RLHF, which is optimising for a particular type of dialogue data in which the model is asked to act as an "Assistant" to a human user.

Part of the reason I suspect my explanation is right is that ChatGPT and BingGPT were almost certainly finetuned on large amounts of dialogue data, collected from interactions with users, and yet most of the failure modes of BingGPT that made the media are not stuff like "we asked it to solve this complex reasoning problem and it failed horribly", they are instead coming from prompts which are very much in distribution for dialogue data, such as asking the model what it thinks about X, or asking the model to pretend it is Y and you would expect the model to have seen dialogues which start similarly before. I find underfitting on this data to be quite unlikely as an explanation.

3

Hyper1on t1_j9wbysn wrote

Well, the obvious optimisation shortcoming is overfitting. We cannot distinguish this rigorously without access to model weights, but we also have a good idea what overfitting looks like in both pretraining and RL finetuning (in both cases it tends to result in common repeated text strings and a strong lack of diversity in output, a sort of pseudo mode collapse). We can test this by giving Bing GPT the same question multiple times and observing if it has a strong bias towards particular completions -- having played with it a bit I don't think this is really true for the original version, before Microsoft limited it in response to criticism a few days ago.

Meanwhile, the alternative hypothesis I raised seems very plausible and fits logically with prior work on emergent capabilities of LLMs (https://arxiv.org/abs/2206.07682), since it seems only natural to expect that when you optimise a powerful system for an objective sufficiently, it will learn instrumental behaviours which help it minimise that objective, potentially up to and including appearing to simulate various "personalities" and other strange outputs.

Personally, as a researcher who works on RL finetuned large language models and has spent time playing with many of these models, my intuition is that Bing GPT is not RL finetuned at all but is just pretrained and finetuned on dialogue data, and the behaviour we see is just fairly likely to arise by default, given Bing GPT's particular model architecture and datasets (and prompting interaction with the Bing Search API).

3

Hyper1on t1_j9w5k2x wrote

I mean, it seems like the obvious explanation? That the model's behaviour is incentivised by its training objective. It also seems very plausible: we know that language models at large scale (even if not RL finetuned) exhibit a wide variety of emergent behaviours which you might not guess are motivated by next token prediction, but evidently are instrumental to reducing the loss. This is not necessarily overfitting: the argument is simply that certain behaviour unanticipated by the researchers is incentivised when you minimise the loss function. Arguably, this is a case of goal misgeneralisation: https://arxiv.org/abs/2105.14111

3

Hyper1on t1_j9vuyzm wrote

The hypothesis is precisely that the failure mode of Bing Chat comes from it being too strong, not too weak. That is, if prompted even in quite vague ways it can exhibit instrumentally convergent behaviour like threatening you, even though this was obviously not the designer's objective, and this behaviour occurs as a byproduct of being highly optimised to predict the next word (or an RL finetuning training objective). This is obviously not possible with, say, GPT-2, because GPT-2 does not have enough capacity or data thrown at it to do that.

4

Hyper1on t1_j9vudgi wrote

Just wanted to point out that even if we restrict ourselves purely to an agent that can only interact with the world through the internet, code, and natural language, that does not address the core AI alignment arguments of instrumental convergence etc being dangerous.

2

Hyper1on t1_j7c7vga wrote

This isn't true - GDPR puts much more onerous restrictions on what consent must be gained before personal data is processed. Much of what cookies collect is considered personal data, and so immediately on GDPR's passing, many websites started to change their cookie acceptance boxes to these massive things which take up half the screen and have granular consent check boxes. Another factor which just makes browsing the web increasingly inconvenient for the average user.

3

Hyper1on t1_j6xn5do wrote

> AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

This is actually a compliment to AI21, since davinci001 is fine-tuned from original 175B davinci on human feedback over generations:

https://platform.openai.com/docs/model-index-for-researchers

The better comparison is with plain davinci, and you would expect 001 to be better and 003 to be significantly better (the latter is trained with RLHF).

There are currently no open source RLHF models to compete with davinci 003, but this will change in 2023.

1

Hyper1on t1_j6xm8m9 wrote

This is a fine approach, but it's not necessarily chain of thought if you move the actual problem solving outside of the LM. The entire point of Chain of Thought as originally conceived is that it's a better way of doing within-model problem solving. I would be interested to see the result if you were to finetune the LM on a dataset of reasoning from this approach, however.

2

Hyper1on t1_j3wp270 wrote

Why were OpenAI the first to make a model as good as ChatGPT then? It seems clear there is a significant talent and experience advantage in this. I should also mention that no company other than OpenAI has the same quantity of data on human interactions with large language models, thanks to the past 2 and a half years of the OpenAI API.

15

Hyper1on t1_j2dzz01 wrote

Bit early to say, but I'd be willing to bet that most of their major papers this year will be widely cited. Their work on RLHF, including constitutional AI and HH seems particularly likely to be picked up by other industry labs, since it provides a way to improve LLMs deployed in the wild while reducing the cost of collecting human feedback data.

2

Hyper1on t1_j0wrenm wrote

Obviously very different situations, since plane inspections are actually reliable. These days I don't view a preprint as any different to a NeurIPS paper, since if I read the preprint and think it's good then essentially the only difference is that one has passed the NeurIPS reviewer lottery. I advise all ML researchers to just trust themselves, read the preprint, and if they think it's valuable then feel free to cite it.

2

Hyper1on t1_j0wr203 wrote

I'm sure that a bunch more people moved today or yesterday, but so far the only perceptible difference in my Twitter feed is that the people with a tendency to stir up Twitter drama have become less visible. There's still plenty of paper announcements/discussion on Twitter right now, so I don't see the need to move.

I also think that federation is a terrible way to run a social network, and that Mastodon is so obviously a poor replacement for Twitter that people will eventually realise this and go back. There is just no good Twitter alternative in existence right now.

3

Hyper1on t1_j0nbs7q wrote

I don't know about the linux kernel source, but having contributed to several major OSS libraries including Pytorch, I think that most PRs in my experience can be more easily described in natural language than in code. When I said comments, I didn't mean like line by line comments of everything, but I was more thinking of docstrings. I am very sceptical of the idea that on average it is faster to write complex code than to describe what you want it to do, which is partly why I think AI code synthesis can achieve significant speedups here.

3

Hyper1on t1_j0e7dv0 wrote

Look at Algorithm Distillation, you can clearly do RL in-context in LLMs. The point of this discussion is that "being asked to sample the next token" can, if sufficiently optimized, encompass a wide variety of behaviours and understanding of concepts, so saying that it's just a static LLM seems to be missing the point. And yes, it's just correlations all the way down. But why should this preclude understanding or awareness of the problem domain?

2