Viewing a single comment thread. View all comments

farmingvillein t1_izv9oug wrote

Because the paper does not at all address the (apparently) longer-term context memory that ChatGPT displays.

6

maxToTheJ t1_izvbssa wrote

Maybe because

A) The paper tells you all the ingredients. What you can infer is all there in black and white.

B) "apparently" means that it isnt a known effect. ie look at the highest voted comment. It just concats prompts if you look at the logs https://www.reddit.com/r/MachineLearning/comments/zjbsie/d_has_open_ai_said_what_chatgpts_architecture_is/izuqszr/

Prompt suggestive of a 822 limit https://twitter.com/jlopez_dl/status/1599052752935874561?s=20

Although it does seem that OpenAI will charge you for N size text even if they realize only M is actual non-junk and only feed that into the model which makes sense. If you gave it a bunch of profanity and it has to block your prompt or strip it they still want to charge you when they do begin charging.

C) If you are looking for a theoretical reason for something that you arent even sure is an actual effect it isnt there hence why not in the paper.

D) Clearly nobody wants to put in the work to read the blog less the paper but really want to crowdsource some unvetted hand wavy hypothesis that kind of sounds right as voted by folks who neither read the paper or the blog. Thats actually kind of ironic if you read the blog and read how much thought OpenAI is putting into preferring "ground truth"

−9

farmingvillein t1_izvcehu wrote

> A) The paper tells you all the ingredients.

Maybe, maybe not--expert consensus is probaby not. BlenderBot, e.g., uses different techniques to achieve long-term conversational memory. Not clear what techniques ChatGPT is using.

> B) "apparently" means that it isnt a known effect.

General consensus is that there is either a really long context window going on or (more likely) some sort of additional long-term compression technique.

> D) Clearly nobody wants to put in the work to read the blog less the paper

Neither of these address the apparently improved long-term conversational memory improvements observed with ChatGPT--unless it turns out to just be a longer context window (which seems unlikely).

Everyone is tea-leaf reading, if/until OpenAI opens the kimono up, but your opinion is directly contrary to the expert consensus.

5

maxToTheJ t1_izvcrws wrote

>Maybe, maybe not--expert consensus is probaby not. BlenderBot, e.g., uses different techniques to achieve long-term conversational memory. Not clear what techniques ChatGPT is using.

How do you figure BlenderBot does that?

>Not clear what techniques ChatGPT is using.

What qualifies as a technique?

> General consensus is that there is either a really long context window going on or (more likely) some sort of additional long-term compression technique.

Source?

−1

farmingvillein t1_izvh1t9 wrote

> How do you figure BlenderBot does that?

BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.

> What qualifies as a technique?

Whatever would be needed to replicate the underlying model/system.

It could just be a vanilla transformer n^2 context window, but this seems unlikely--see below.

> Source?

GPT3 (most recent iteration) context window is 2048 tokens; ChatGPT is supposedly ~double (https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation).

This, on its own, would suggest some additional optimizations, as n^2 against a context window of (presumably) ~4096 tokens gets very expensive, and generally unrealistic.

(More generally, it would be surprising to see a scale-up to a window of that size, given the extensive research already extant on scaling up context windows, while breaking the n^2 bottleneck.)

Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): e.g., see https://twitter.com/goodside/status/1598882343586238464

Like all things, it could be that the answer is simply "more hardware"--but, right now, we don't know for sure, and there have been copious research papers on dealing with this scaling issue more elegantly, so, at best, we can say that we don't know. And the probabilistic leaning would be that something more sophisticated is going on.

5

maxToTheJ t1_izvhc51 wrote

> BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.

ie you read the paper. I guessed that was the answer but didnt want to it before to bias you against you answering that way.

>Whatever would be needed to replicate the underlying model/system.

exactly.

>Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): https://twitter.com/goodside/status/1598882343586238464

Having a bigger window is a parameter while the context windows implementation in the code is the technique. Also much of the discussion isnt necessarily indicative of a bigger window but could also be truncating more effectively which is not really a "long" memory but more about how to chose what is useful.

−2

farmingvillein t1_izvicpn wrote

> Having a bigger window is a parameter while the context windows implementation in the code is the technique

Do you work at OpenAI? If yes, awesome. If no, how can you make this claim?

OpenAI has released few details about how ChatGPT was built.

3

maxToTheJ t1_izvjnht wrote

Its the discussion in the thread you linked where people are thinking about implementation possibilities in a post by one of the users in the conversation

Also

https://twitter.com/jlopez_dl/status/1599057239809331200?s=20

is really indicative of a 822 limit. Especially since the prompt in that users test case is way better than the one with a bunch of A's that the thread starter used which is much easier to detect an preprocess.

Here is that users test case.

https://twitter.com/jlopez_dl/status/1599052209399791617?s=20

Now look at the thread starters

https://twitter.com/goodside/status/1598874679854346241?s=20

out of the two which one could you easily regex out adversarial noise in the input?

The discussion about the billing is pretty funny though because it seems possible that OpenAI will strip and remove the adversarial text you put in your prompt but if they will charge you they will possibly charge you for the unfiltered text. That makes sense because when they do charge if you gave it a bunch of profanity and it has to block your prompt or strip it they probably will still want to charge you.

−1

farmingvillein t1_izvka14 wrote

> is really indicative of a 822 limit

This is not germane to our conversation at all. Do you understand the underlying discussion we are having?

1

maxToTheJ t1_izvkm5s wrote

You all are claiming chatGPT has some type of huge memory? How is a 822 limit not indicative of that.

Clarify the claim and how the https://twitter.com/goodside/status/1598882343586238464 applies in that case? You brought that source into the thread and now are claiming the discussion in that thread is off topic?

−1

farmingvillein t1_izvks5s wrote

Are you a bot? The 822 limit has nothing to do with the context window (other than being a lower bound). The tweet thread is talking about an ostensible limit to the prompt description.

2

maxToTheJ t1_izvkxg0 wrote

You brought that source into the thread and now are claiming the discussion in that thread is off topic?

You still havent shown proof that the context window is crazy long for a GPT model. I hope that test case in the thread with a bunch of AAAA's isnt your evidence.

−1

farmingvillein t1_izvlja1 wrote

I linked you to a discussion about the context window. You then proceeded to pull a tweet within that thread which was entirely irrelevant. You clearly have no idea about the underlying issue we are discussing (and/or, again, are some sort of bot-hybrid).

3

maxToTheJ t1_izvm3wq wrote

Dude the freaking logs on chrome show OpenAI concats the prompts.

>You then proceeded to pull a tweet within that thread which was entirely irrelevant

your exact words. Try standing by them

> (other than being a lower bound).

A lower bound is relevant its basic math. Freaking proofs are devoted to setting lower bounds

I am still waiting on any proof of any extraordinary for a GPT3 type model memory . Since it is extremely relevant for explaining something ,is to know it exist in the first place

−2

farmingvillein t1_izvnwdh wrote

...the whole twitter thread, and my direct link to OpenAI, are about the upper bound. The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer), and the fact that you pulled it tells me that you literally don't understand how transformers or the broader technology works, and that you have zero interest in learning. Are you a Markov chain?

2

maxToTheJ t1_izvotec wrote

> The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer)

OpenAI says the "cache" is '3000 words (or 4000 tokens)". I dont see anything about the input being that. The test case the poster in the twitter thread with spanish is indicative of input being the lower bound which also aligns with what the base GPT3.5 model in the paper has. The other stress test was trivial

https://help.openai.com/en/articles/6787051-does-chatgpt-remember-what-happened-earlier-in-the-conversation

> ...the whole twitter thread, and my direct link to OpenAI, are about the upper bound.

Details. No hand wavy shit, explain with examples why its longer especially since your position is some magical shit not in the paper/blog is happening.

0

farmingvillein t1_izvq3i8 wrote

> I dont see anything about the input being that.

Again, this has absolutely nothing to do with the discussion here, which is about memory outside of the prompt.

Again, how could you possibly claim this is relevant to the discussion? Only an exceptionally deep lack of conceptual understanding could cause you to make that connection.

4

maxToTheJ t1_izvqh2f wrote

This is boring. I am still waiting on those details.

No hand wavy shit, explain with examples showing its impressively longer especially since your position is some magical shit not in the paper/blog is happening.

1