Submitted by radi-cho t3_11izjc1 in MachineLearning
RSchaeffer t1_jb26p98 wrote
Lucas Beyer made a relevant comment: https://twitter.com/giffmana/status/1631601390962262017
"""
​
The main reason highlighted is minibatch gradient variance (see screenshot).
This immediately asks for experiments that can validate or nullify the hypothesis, none of which I found in the paper
​
"""
Viewing a single comment thread. View all comments