abudabu t1_je9ixnd wrote on March 30, 2023 at 12:14 PM

Reply to comment by Not-Banksy in When people refer to “training” an AI, what does that actually mean? by Not-Banksy

The GPUs aren’t actually connected together physically. The transformer architecture is entirely in software. The software uses GPUs to do matrix calculations efficiently.

Specifically, the transformer architecture is a bunch of large matrices connected together with arithmetic operations. The training process shows it a sequence of words and sees if it correctly predicts the next word. It figures out how “wrong” the prediction is and updates the matrices so that the prediction will be slightly more right next time. This is a very high level description of “back propagation”.

Using text to automatically train the network is called self-supervised learning. It’s great because no human input is required, just lots of text.

There are many other forms of training. ChatGPT works because it was also trained using human reinforcement feedback learning (HRFL), where humans rank a set of answers. Basically the same underlying process as above, but the answers generated by the network are used to train the network, and the ranking is used to prefer the better answers. Probably when we’re giving up and down votes, OpenAI is using that for HRFL.

Another approach is to use humans to create examples. OpenAI hired people in Africa to have conversations where one played the role of the chatbot. This kind of training helped the network understand chat style interactions.

Since it’s a next word predictor, the chat data has special tokens in the text which represent “user” and “chatbot” roles. So maybe that helps you imagine it better as a very fancy autocomplete.