float16

float16 t1_j4uqlls wrote

2 seems doable. Not everybody has to have a GPU, but I bet lots of people, including me, would rather spin up the GPU in their personal computer for a few seconds than manually specify where skippable segments are.

The one central server thing bugs me. I'd prefer something like "query your nearest neighbors and choose the one with the most recent data." No idea how to do that though; not a systems person.

1

float16 t1_j10sam3 wrote

OK guys, you can chill with the downvotes. They're just asking questions.

As mentioned elsewhere, Python does not do much work; the important parts are in CUDA. So if you used some other language such as C++, you still can't write loops, and you still have to use the framework's data structures.

5

float16 t1_it1mphe wrote

I looked at it. He misspoke and probably meant that batch normalization makes the preactivations closer to the domain of the activation function (tanh in this case) where the derivative is far from 0.

Also, "Gaussian" is often used to refer to the standard normal distribution.

Same kind of deal when lots of people say "convolution" when they mean "cross correlation."

To answer your question, no, it is not necessary, but good researchers often have a solid math foundation.

8