jrkirby

jrkirby t1_jdzx1ef wrote

I'm guessing the hard part is that you can't "untrain" a model. They hadn't thought "I want to benchmark on these problems later" when they started. Then they spent 20K$+ compute on training. Then they wanted to test it. You can easily find the stuff you want to test on in your training dataset, sure. But you can't so easily remove it and train everything again from scratch.

7

jrkirby t1_ivx9xjl wrote

What happens when all the weights to a ReLU neuron are 0? The ReLU function's derivative is discontinuous at zero. I figure in most practical situations this doesn't matter because the odds of many floating point numbers adding up to exactly 0.0 floating point is negligible. But this paper begs the question of what that would do. Is the derivative of ReLU at 0.0 equal to NaN, 0 or 1?

37