Competitive_Dog_6639
Competitive_Dog_6639 t1_j8swbzz wrote
Reply to comment by bernhard-lehner in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
EVolved sign momEntum (EVE) 🤣
Competitive_Dog_6639 t1_j8qa7em wrote
Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
ML acronyms are getting out of hand, just use any letter from any of the words I guess...
Competitive_Dog_6639 t1_j42pzbq wrote
Reply to [D] Can someone point to research on determining usefulness of samples/datasets for training ML models? by HFSeven
Not exactly what you are describing with the A, B, C groups, but this recent paper examines ways to prune data by introducing a way to measure how useful samples are: https://openreview.net/forum?id=UmvSlP-PyV
Competitive_Dog_6639 t1_j30hjmd wrote
The situation you are describing is not possible in theory. If a matrix is PSD and invertible, it must be positive definite. And the inverse of a positive definite matrix is also positive definite, which means it must only yield positive Mahalonois distances (or zero if the vectors are identical). https://math.stackexchange.com/questions/2288067/inverse-of-a-symmetric-positive-definite-matrix
In practice, this might happen due to small eigenvalues and numerical error. The easiest fix is to add the identity scaled by a small constant, like in ridge regression, as others suggest
Competitive_Dog_6639 t1_j0jwrt1 wrote
Reply to comment by Ok-Teacher-22 in [R] Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow by Ok-Teacher-22
Still some annoying bugish things in torch for sure, like this shuffle error (the fix isn't very satisfying and it's easy to overlook): https://github.com/pytorch/pytorch/issues/31771
Competitive_Dog_6639 t1_iz2oaue wrote
Reply to [R] The Forward-Forward Algorithm: Some Preliminary Investigations [Geoffrey Hinton] by shitboots
Hinton is awesome and really enjoyed his neurips talk. Naive question: are single layer gradients biologically plausible? My understanding is that gradients back thru multiple layers are not. The FF algorithm still uses gradients for single layers tho right?
Competitive_Dog_6639 t1_ivfrjh9 wrote
My take: the prior work mentioned doesnt undermine the main claims of the paper, which is that without retraining one can find permutations to map nets to the same basin.
The objection to this point is raise in part C) by the commenter, where an appeal is made to the commenters own paper. I read that and didn't see explicit ideas related to connected modes in the commenters paper. Plus, the commenters paper retains the nets, which is against the main idea of git reason. While the ideas of mode connectivity may be latent, they are not mentioned at all. Why is it the job of the git rebasin authors to dig so deeply into one out of thousands of related paper out there to give the commenter credit for an idea that isn't even explicitly discussed? Would also like to point out that the commenters paper might be missing related references to things like SWA, so maybe nobodys perfect?
Even if many of the methods come from previous work, I dont see anything to undermine the central claim of git rebasin, and for me that's that, it's an original and important idea. Could it use better relations to previous work? Sure.
Competitive_Dog_6639 t1_jcbhoi1 wrote
Reply to [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
NLP researchers are breathing a massive sigh of release bc if GPT4 is unpublished they dont need to include it in benchmarks for their new papers 😆