FlyingCockAndBalls t1_je8h4g3 wrote on March 30, 2023 at 4:20 AM

Reply to comment by ActuatorMaterial2846 in When people refer to “training” an AI, what does that actually mean? by Not-Banksy

what is so special about the transformer architecture?

ActuatorMaterial2846 t1_je8ik1t wrote on March 30, 2023 at 4:34 AM

It's actually quite technical, but essentially, the transformer architecture helps each part of the sentence “talk” to all the other parts at the same time. This way, each part can understand what the whole sentence is about and what it means.

Here is the paper that imo changed the world 6 years ago and is the reason for the current state of AI.

https://arxiv.org/abs/1706.03762

If it goes over your head (it did for me), ask bing or chatgpt to summarise it for you. It helped me get my head around this stuff, as I'm in no way an expert nor do I study this field.

turnip_burrito t1_je8i45w wrote on March 30, 2023 at 4:30 AM

"Attention mechanism" makes it good at predicting new words from past ones.

The paper that introduced the attention mechanism is called Attention its All You Need.

Zermelane t1_je8lss0 wrote on March 30, 2023 at 5:09 AM

Better parallelism in training, and a more direct way to reference past information, than in RNNs (recurrent neural networks) which seemed like the "obvious" way to process text before transformers came by.

These days we have RNN architectures that can achieve transformer-like training parallelism, the most interesting-looking one being RWKV. They are still badly disadvantaged when needing information directly from the past, for instance to repeat a name that's been mentioned before, but they have other advantages, and their performance gets close enough to transformers that it could be just a question of scaling exponents which architecture ends up winning out.