Submitted by flowday t3_118lycd in singularity
TFenrir t1_j9i5653 wrote
Holy shit, 32k token context? That is a complete fucking game changer. That's about 30k words. Current context length is about 4k tokens.
A simple example of why that is relevant - it's hard for a model to hold an entire research paper in its context right now - this could now handle probably multiple research papers in context.
Code wise... It's the difference between a 100 line toy app, to something like 800.
Context window increasing this much just also makes so many more apps easier to write, or fundamentally possible when they weren't before. Chat memory extends, short story writing basically hits new heights. The average book has 500 words a page, ish - about 6-7 pages currently, will jump up to 50.
1 token = 4 English characters. Average word is 4.7.
turnip_burrito t1_j9i94b1 wrote
Now you got me excited about 2-3 years from now when the order of magnitude jumps 10x again or more.
Right now that's a good amount. But when it ncreases again by 10x, that would be enough to handle multiple very large papers, or a whole medium size novel plus some.
In any case, say hello to loading tons of extra info into short term context to improve information synthesis.
You could also do computations within the context window by running mini "LLM programs" within it while working on a larger problem, using it as a workspace to solve a problem.
TFenrir t1_j9iaxdg wrote
I really want to see how coherent and sensible it can be at 32k tokens, and a fundamentally better model. Could it write a whole short story off a prompt?
turnip_burrito t1_j9ib4kx wrote
That's a really good question. I want to see too how reasoning, coherence, and creativity are affected by large context length.
diabeetis t1_j9irqxa wrote
the order of magnitude will always jump by 10x
turnip_burrito t1_j9it0u6 wrote
No, could be 100x or 1000x.
redpnd t1_j9j6dl9 wrote
Those are two and three order of magnitudes.
turnip_burrito t1_j9j74rx wrote
Yes, I said "when the order of magnitude jumps by 10x or more".
Hence a jump by one order of magnitude, (10x), or two orders of magnitude (100x), or three orders of magnitude (1000x).
You can jump by more than one order of magnitude. Diabeetus' comment is wrong, because the order of magnitude can grow more than one order.
diabeetis t1_j9jb6wd wrote
I mean who cares but I think the standard way of expressing that thought would be "jumps by 2 or 3 orders of magnitude"
turnip_burrito t1_j9jbkjo wrote
I think the intended meaning is quite clear by context, but your point is taken.
redpnd t1_j9kg57f wrote
🤗
iamozymandiusking t1_j9ke88e wrote
Agreed, but also at this pace, I doubt it will take that long.
grimorg80 t1_j9ke8k6 wrote
Two three years?? It's gonna happen way sooner than that.
nexapp t1_j9m0ja0 wrote
>https://twitter.com/transitive_bs/status/1628118163874516992?s=20
make it 100x more as the most likely estimate given present lightning speed progression.
gONzOglIzlI t1_j9izs1r wrote
I'm I the only one wondering how quantum computers will factor in to all of this?
Feels like a hidden wild card, could expend the token budged exponentially.
turnip_burrito t1_j9j1pe8 wrote
Why would it expand the token budget exponentially?
Also we have nowhere near enough qubits to handle these kinds of computations. The number of bits you need to run these models is huge (GPT3 ~170bil or 10^11 parameters). Quantum computers nowadays are lucky to be around 10^3 qubits, and they decohere too quickly to be used for very long (about 10^-4 seconds). * numbers pulled from a quick Google search.
That said, new (classical computer) architectures do exist that can use longer context windows: H3 (Hungry Hungry Hippos) and RWVST or whatever it's called.
D2MAH t1_j9jb0sf wrote
I’m not quite sure if quantum computing is even needed for AGI. It’s so far behind and we’re so far ahead.
ChezMere t1_j9owekb wrote
Quantum computers work nothing like how you think they do, and are completely useless for AI (as well as almost all other classes of problem).
gONzOglIzlI t1_j9so4my wrote
"Quantum computers are completely useless for AI."
Bold prediction, we'll see how it ages.
Can't say I'm anything close to an expert, but I do have masters CS, was a competitive programmer and am a professional programmer now with 10y of xp.
GPT-5entient t1_j9l4ex1 wrote
32k tokens would mean approximately 150 kB of text. That is a decent sized code base! Also with this much context memory the known context saving tricks would work much better so this could be theoretically used to create code bases of virtually unlimited size.
This amazes me and also (being a software dev) also scares me...
But, as they say, what a time to be alive!
GoldenRain t1_j9j3fb1 wrote
I wonder how expensive each prompt is though.
GPT-5entient t1_j9l5ph0 wrote
From that table it looks like it will be 6x more expensive than ChatGPT's model. It looks like you need 600 units per instance vs. 100. Not sure how this translates into raw token cost though, but it seems that it is going to be more expensive once they expose serverless pay-as-you-go pricing. text-davinci-003 is $0.02 per 1k token so this could be $0.12 per kilotoken.
YobaiYamete t1_j9k8s5j wrote
> That's about 30k words.
Dude give me this, but for a character Ai style chat or NovelAI style RP
visarga t1_ja8c2yk wrote
I tested a paper quickly and it was 20K tokens in 200KB of text.
Viewing a single comment thread. View all comments