maxToTheJ t1_izvm3wq wrote
Reply to comment by farmingvillein in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
Dude the freaking logs on chrome show OpenAI concats the prompts.
>You then proceeded to pull a tweet within that thread which was entirely irrelevant
your exact words. Try standing by them
> (other than being a lower bound).
A lower bound is relevant its basic math. Freaking proofs are devoted to setting lower bounds
I am still waiting on any proof of any extraordinary for a GPT3 type model memory . Since it is extremely relevant for explaining something ,is to know it exist in the first place
farmingvillein t1_izvnwdh wrote
...the whole twitter thread, and my direct link to OpenAI, are about the upper bound. The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer), and the fact that you pulled it tells me that you literally don't understand how transformers or the broader technology works, and that you have zero interest in learning. Are you a Markov chain?
maxToTheJ t1_izvotec wrote
> The 822 number is irrelevant (given that OpenAI itself tells us that the window is much longer)
OpenAI says the "cache" is '3000 words (or 4000 tokens)". I dont see anything about the input being that. The test case the poster in the twitter thread with spanish is indicative of input being the lower bound which also aligns with what the base GPT3.5 model in the paper has. The other stress test was trivial
> ...the whole twitter thread, and my direct link to OpenAI, are about the upper bound.
Details. No hand wavy shit, explain with examples why its longer especially since your position is some magical shit not in the paper/blog is happening.
farmingvillein t1_izvq3i8 wrote
> I dont see anything about the input being that.
Again, this has absolutely nothing to do with the discussion here, which is about memory outside of the prompt.
Again, how could you possibly claim this is relevant to the discussion? Only an exceptionally deep lack of conceptual understanding could cause you to make that connection.
maxToTheJ t1_izvqh2f wrote
This is boring. I am still waiting on those details.
No hand wavy shit, explain with examples showing its impressively longer especially since your position is some magical shit not in the paper/blog is happening.
Viewing a single comment thread. View all comments