Viewing a single comment thread. View all comments

RSchaeffer t1_jb26p98 wrote on March 5, 2023 at 9:33 PM

"""

The main reason highlighted is minibatch gradient variance (see screenshot).

This immediately asks for experiments that can validate or nullify the hypothesis, none of which I found in the paper

"""