jrkirby t1_je2f63r wrote on March 28, 2023 at 10:32 PM

Reply to comment by Thorusss in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

2 million dollars or 20 million dollars is greater than 20 thousand. And it makes the main thesis more salient - the more money you've spent training, the less willing you'll be to retrain the entire model from scratch just to run some benchmarks the "proper" way.

jrkirby t1_jdzx1ef wrote on March 28, 2023 at 12:41 PM

Reply to comment by hadaev in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

I'm guessing the hard part is that you can't "untrain" a model. They hadn't thought "I want to benchmark on these problems later" when they started. Then they spent 20K$+ compute on training. Then they wanted to test it. You can easily find the stuff you want to test on in your training dataset, sure. But you can't so easily remove it and train everything again from scratch.

jrkirby t1_j8ibjzo wrote on February 14, 2023 at 3:01 PM

Reply to [D] Repeating important samples in every batch for NN training? by zxkj

https://en.wikipedia.org/wiki/Importance_sampling

jrkirby t1_j1bnhkx wrote on December 23, 2022 at 2:40 AM

Reply to comment by sanman in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

Why do you think they'll make us pay, when they could instead the treasure trove of personal information to sell to advertisers and train the AI to subliminally (or explicitly) advertise to us?

jrkirby t1_j1603hk wrote on December 21, 2022 at 10:35 PM

Reply to comment by m_nemo_syne in [D] Different types of pooling in Neural Nets by Difficult-Race-1188

That's what I would have imagined Rank Based Average Pooling referred to, but apparently Rank Based Average Pooling is some complicated mess.

jrkirby t1_ivx9xjl wrote on November 11, 2022 at 8:02 AM

Reply to [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru

What happens when all the weights to a ReLU neuron are 0? The ReLU function's derivative is discontinuous at zero. I figure in most practical situations this doesn't matter because the odds of many floating point numbers adding up to exactly 0.0 floating point is negligible. But this paper begs the question of what that would do. Is the derivative of ReLU at 0.0 equal to NaN, 0 or 1?