was_der_Fall_ist t1_jduk3s8 wrote on March 27, 2023 at 8:55 AM

They’re also not realizing that even if the goal is to produce the most probable/useful next word, that doesn’t preclude the neural network from doing other complicated operations in order to figure out the most probable/useful word.

bpooqd t1_jdun73m wrote on March 27, 2023 at 9:40 AM

I suspect those people believe that gpt4 is actually a markov chain.

IDe- t1_jdv5f5b wrote on March 27, 2023 at 1:00 PM

I mean it is a (higher order) Markov chain.

Gh0st1y t1_jdvr5qr wrote on March 27, 2023 at 3:37 PM

Yeah but so are we haha

sineiraetstudio t1_jdws5js wrote on March 27, 2023 at 7:33 PM

All higher-order markov chains can be modeled as a first-order markov chain by squashing states together.

AndreasVesalius t1_jdx7r37 wrote on March 27, 2023 at 9:13 PM

It’s just a bunch of if/else statements

light24bulbs t1_jduwgqt wrote on March 27, 2023 at 11:36 AM

Yeah, like it's actually using a huge amount of brain power to figure out what the next word is. Just because that's how it works doesn't mean it's not intelligent.

If you want to be really good at figuring out what the next word is you have to be really smart

bartvanh t1_jdyd6om wrote on March 28, 2023 at 2:15 AM

Ugh, yes it's so frustrating to see people not realizing this bit all the time. And also kind of painful to imagine that (presumably - correct me if I'm wrong) all those internal "thoughts" are probably discarded after each word, only to be painstakingly reconstructed almost identically for predicting the next word.

was_der_Fall_ist t1_je3ng6m wrote on March 29, 2023 at 4:18 AM

Maybe that’s part of the benefit of using looped internal monologue/action systems. By having them iteratively store thoughts and otherwise in their context window, they no longer have to use the weights of the neural network to “re-think” every thought each time they predict a token. They could think more effectively by using their computation to do other operations that take the internal thoughts and actions as their basis.

Uptown-Dog t1_jdw6kh1 wrote on March 27, 2023 at 5:17 PM

Okay wow. I needed this comment. Thanks.

ntaylor- t1_je11vt1 wrote on March 28, 2023 at 5:19 PM

Fairly sure the "final" gpt4 model is still using a generate function that predicts one token at a time. Just the training was good and complicated via RLHF. After training it's not doing any "complicated operations".

was_der_Fall_ist t1_je15397 wrote on March 28, 2023 at 5:39 PM

You don’t think the neural network, going through hundreds of billions of parameters each time it calculates the next token, is doing anything complicated?

ntaylor- t1_je5qtl2 wrote on March 29, 2023 at 4:38 PM

Nope. It's the same as all neural networks using transformer architecture. Just a big old series of matrix multiplications with some non linear transformations at end of the day

was_der_Fall_ist t1_je6lfl9 wrote on March 29, 2023 at 7:52 PM

Why are matrix multiplications mutually exclusive with complicated operations?

A computer just goes through a big series of 0s and 1s, yet through layers of abstraction they accomplish amazing things far more complicated than a naive person would think 0s and 1s could represent and do. Why not the same for a massive neural network trained via gradient descent to maximize a goal by means of matrix multiplication?

Rioghasarig t1_jdxrp3y wrote on March 27, 2023 at 11:34 PM

No they were right about with he base model of GPT. As the base model was trained simply to predict the next word. ChatGPT and GPT4 have evolved beyond that (with things like RLHF).

astrange t1_jdy6d4f wrote on March 28, 2023 at 1:23 AM

But nobody uses the base model, and when they did use it, it was only interesting because it fails to predict the next word and therefore generates new text. A model that successfully predicts the next word all the time given existing text would be overfitting, since it would only produce things you already have.

Rioghasarig t1_jdz24za wrote on March 28, 2023 at 6:11 AM

People were using the base model when it first came out and some people are still using it today. The game AI Dungeon is still runs on what is essentially a transformer trained on next token prediction. So it would be accurate to say "It's just (attempts to) outputs the next most probable word" .

ntaylor- t1_je11iqf wrote on March 28, 2023 at 5:17 PM

But eventually, after RLHF, the gpt4 model is one final fixed model and still presumably uses a generate function that will be predicting next tokens based on the previous, as base gpt models/any autoregressive model does. At least that's what it seems to be doing.

[D]GPT-4 might be able to tell you if it hallucinated

astrange t1_jdujlcf wrote on March 27, 2023 at 8:47 AM