maxToTheJ

maxToTheJ t1_izvkm5s wrote

You all are claiming chatGPT has some type of huge memory? How is a 822 limit not indicative of that.

Clarify the claim and how the https://twitter.com/goodside/status/1598882343586238464 applies in that case? You brought that source into the thread and now are claiming the discussion in that thread is off topic?

−1

maxToTheJ t1_izvjnht wrote

Its the discussion in the thread you linked where people are thinking about implementation possibilities in a post by one of the users in the conversation

Also

https://twitter.com/jlopez_dl/status/1599057239809331200?s=20

is really indicative of a 822 limit. Especially since the prompt in that users test case is way better than the one with a bunch of A's that the thread starter used which is much easier to detect an preprocess.

Here is that users test case.

https://twitter.com/jlopez_dl/status/1599052209399791617?s=20

Now look at the thread starters

https://twitter.com/goodside/status/1598874679854346241?s=20

out of the two which one could you easily regex out adversarial noise in the input?

The discussion about the billing is pretty funny though because it seems possible that OpenAI will strip and remove the adversarial text you put in your prompt but if they will charge you they will possibly charge you for the unfiltered text. That makes sense because when they do charge if you gave it a bunch of profanity and it has to block your prompt or strip it they probably will still want to charge you.

−1

maxToTheJ t1_izvhc51 wrote

> BlenderBot paper specifically states that it is a combination of your standard transformer context window and explicit summarization operations.

ie you read the paper. I guessed that was the answer but didnt want to it before to bias you against you answering that way.

>Whatever would be needed to replicate the underlying model/system.

exactly.

>Further, though, investigation suggests that the "official" story here is either simply not correct, or it is missing key additional techniques; i.e., under certain experimental contexts, it seems to have a window that operates beyond the "official" spec (upwards of another 2x): https://twitter.com/goodside/status/1598882343586238464

Having a bigger window is a parameter while the context windows implementation in the code is the technique. Also much of the discussion isnt necessarily indicative of a bigger window but could also be truncating more effectively which is not really a "long" memory but more about how to chose what is useful.

−2

maxToTheJ t1_izvcrws wrote

>Maybe, maybe not--expert consensus is probaby not. BlenderBot, e.g., uses different techniques to achieve long-term conversational memory. Not clear what techniques ChatGPT is using.

How do you figure BlenderBot does that?

>Not clear what techniques ChatGPT is using.

What qualifies as a technique?

> General consensus is that there is either a really long context window going on or (more likely) some sort of additional long-term compression technique.

Source?

−1

maxToTheJ t1_izvbssa wrote

Maybe because

A) The paper tells you all the ingredients. What you can infer is all there in black and white.

B) "apparently" means that it isnt a known effect. ie look at the highest voted comment. It just concats prompts if you look at the logs https://www.reddit.com/r/MachineLearning/comments/zjbsie/d_has_open_ai_said_what_chatgpts_architecture_is/izuqszr/

Prompt suggestive of a 822 limit https://twitter.com/jlopez_dl/status/1599052752935874561?s=20

Although it does seem that OpenAI will charge you for N size text even if they realize only M is actual non-junk and only feed that into the model which makes sense. If you gave it a bunch of profanity and it has to block your prompt or strip it they still want to charge you when they do begin charging.

C) If you are looking for a theoretical reason for something that you arent even sure is an actual effect it isnt there hence why not in the paper.

D) Clearly nobody wants to put in the work to read the blog less the paper but really want to crowdsource some unvetted hand wavy hypothesis that kind of sounds right as voted by folks who neither read the paper or the blog. Thats actually kind of ironic if you read the blog and read how much thought OpenAI is putting into preferring "ground truth"

−9

maxToTheJ t1_izuwe6z wrote

https://openai.com/blog/chatgpt/

https://huggingface.co/blog/rlhf

EDIT: WTF is up with the downvotes. Many of the answers to OPs questions are in the f'ing blog?

>Has Open AI said what ChatGPT's architecture is?

from the blog

>ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.

Which GPT3.5 is just https://arxiv.org/abs/2203.02155 and that answers most of the other questions. Thats if you want the details. If you dont want the details the diagram in Methods section outlines the process.

Also from blog

>Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback (RLHF).

which is why https://huggingface.co/blog/rlhf is useful

EDIT2: The irony of people complaining about the above. Clearly a big group of you does not want to put in the work to read the blog which in itself is a distilled down version. Much less inclined to read the paper but really wants to crowdsource some unvetted hand wavy hypothesis that kind of sounds right as voted by folks who neither read the paper or the blog. Thats actually kind of ironic if you read the blog and read how much thought OpenAI is putting into preferring "ground truth" .

As a reminder if you prompt copy in 1M words of Shakespeare and ask for some query about the beginning of the text chatGPT and it gets it right I wouldnt jump to some conclusion of extraordinary memory when that text may have been in the training corpus.

148

maxToTheJ t1_iywvblb wrote

Also the queries that people use to judge it are always terribly non-adversarial like OPs query is just "tell me a story with X and Y at ABC" or that people that ask a chatbot about being sentient as if nobody has thought to ask that ever.

Like this isnt the deepest probe into the embedding spaces but its still better than the prodding from the thread and from a Youtuber with no experience in ML.

https://www.youtube.com/watch?v=i9InAbpM7mU

4

maxToTheJ t1_iywupll wrote

> Totally true. I also tend to believe my results are garbage and double- and triple-check.

The market doesnt reward that though. We cant really say for sure that the paper being discussed would have won Outstanding Paper with the less impressive gains so at the end of the day not checking could inadvertantly help your career.

8