float16
float16 t1_j681kmd wrote
Reply to comment by PleasantBase6967 in [D] Laptop recommendations for ML by PleasantBase6967
Then you shouldn't be training on a laptop, and should train on a remote server, in which case your laptop doesn't matter.
float16 t1_j67wop0 wrote
Reply to comment by PleasantBase6967 in [D] Laptop recommendations for ML by PleasantBase6967
It looks like you don't care about throughput so it doesn't matter.
float16 t1_j62agci wrote
Reply to [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
Isn't this just the result of using certain tokenizers? Using Chinese as an example, no reasonable tokenizer developed with Chinese in mind would give you 17 tokens. You'd have maybe 6 to 8:
- 你好
- ,
- 我
- 是
- 个
- 高个子
...depending on whether it thinks 你好 and 高个子 should be split.
float16 t1_j4uqlls wrote
Reply to comment by FastestLearner in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
2 seems doable. Not everybody has to have a GPU, but I bet lots of people, including me, would rather spin up the GPU in their personal computer for a few seconds than manually specify where skippable segments are.
The one central server thing bugs me. I'd prefer something like "query your nearest neighbors and choose the one with the most recent data." No idea how to do that though; not a systems person.
float16 t1_j10sam3 wrote
Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev
OK guys, you can chill with the downvotes. They're just asking questions.
As mentioned elsewhere, Python does not do much work; the important parts are in CUDA. So if you used some other language such as C++, you still can't write loops, and you still have to use the framework's data structures.
float16 t1_iujwruw wrote
Reply to comment by Vanguard3K in How Russia Pays for War: Despite the current level of sanctions, trade with some nations has shrunk surprisingly little and positively boomed with others by UserNamesCantBeTooLo
Can't see it. Not beautiful.
float16 t1_it4y91c wrote
Not a fan of your color map. Because you used a diverging color map, the least saturated colors, near 50%, isn't some intuitive reference value such as 0, and look too much like NA. You should have used a sequential color map.
float16 t1_it1mphe wrote
Reply to [D] is a strong background in math/stats/cs in a necessary condition for becoming a renowned researcher in the ML community? *A passive rant* by [deleted]
I looked at it. He misspoke and probably meant that batch normalization makes the preactivations closer to the domain of the activation function (tanh in this case) where the derivative is far from 0.
Also, "Gaussian" is often used to refer to the standard normal distribution.
Same kind of deal when lots of people say "convolution" when they mean "cross correlation."
To answer your question, no, it is not necessary, but good researchers often have a solid math foundation.
float16 t1_jayqnmi wrote
Reply to comment by SobriquetHeart in Yield rate for Top 150 US Universities [OC] by Roughneck16
Brigham Young University might also be near the top in another kind of yield.