AdditionalPizza t1_itx7tn0 wrote on October 26, 2022 at 11:31 PM

"AD learns a more data-efficient RL algorithm than the one that generated the source data"

This part of the paper is very interesting. The transformer is able to improve upon the original RL algorithms used during pre-training.