Submitted by Not-Banksy t3_126a1dm in singularity
abudabu t1_je9ixnd wrote
Reply to comment by Not-Banksy in When people refer to “training” an AI, what does that actually mean? by Not-Banksy
The GPUs aren’t actually connected together physically. The transformer architecture is entirely in software. The software uses GPUs to do matrix calculations efficiently.
Specifically, the transformer architecture is a bunch of large matrices connected together with arithmetic operations. The training process shows it a sequence of words and sees if it correctly predicts the next word. It figures out how “wrong” the prediction is and updates the matrices so that the prediction will be slightly more right next time. This is a very high level description of “back propagation”.
Using text to automatically train the network is called self-supervised learning. It’s great because no human input is required, just lots of text.
There are many other forms of training. ChatGPT works because it was also trained using human reinforcement feedback learning (HRFL), where humans rank a set of answers. Basically the same underlying process as above, but the answers generated by the network are used to train the network, and the ranking is used to prefer the better answers. Probably when we’re giving up and down votes, OpenAI is using that for HRFL.
Another approach is to use humans to create examples. OpenAI hired people in Africa to have conversations where one played the role of the chatbot. This kind of training helped the network understand chat style interactions.
Since it’s a next word predictor, the chat data has special tokens in the text which represent “user” and “chatbot” roles. So maybe that helps you imagine it better as a very fancy autocomplete.
Viewing a single comment thread. View all comments