Lairv

Lairv t1_ir7qmlc wrote

Cool paper, worth noting that such systems requires huge resources to be trained, they quickly mention it in the appendix "1.600 actors TPUv4 to play games, and 64 TPUv3 to train the networks, during a week". For reference, AlphaZero for Go was trained with 5.000 actors TPUv1 to generate games, and 64 TPUv2 to train networks, during 8 hours. I still find it unfortunate that not much work has been done to reduce resources needed to train AlphaZero-like systems, which is already 5 years old

10

Lairv t1_ir7p0xt wrote

In the article they try 2 types of reward: minimizing the rank of the tensor decomposition (i.e. minimizing total number of multiplication), and minimizing the runtime of the algorithm on a given hardware (they tried with nvidia V100 and TPUv2)

The latter could be actually useful since their graphs shows that the algorithms discovered reach better performances than cuBLAS (Fig.5)

37