Ulfgardleo
Ulfgardleo t1_irtdtoj wrote
This feels and sounds like an add. But i could not find out for what. maybe you should make it clear which product i should definitely use.
Ulfgardleo t1_irm2bsq wrote
Reply to comment by csreid in [D] Giving Up on Staying Up to Date and Splitting the Field by beezlebub33
yes. I think at this point it is important to realize that in the exact moment you got hired by a company, your role changed.
You were the guy with a PhD straight from university who did top-notch research. Now, you are the guy hired to make this project work.
If your job description does not include "active research" or "follow the most recent advances in ML research" then it is not your job to know what is up - especially if it is an advancement in a subfield of ML your project is not actively interested in.
Ulfgardleo t1_irg49u1 wrote
Reply to comment by csreid in [D] Giving Up on Staying Up to Date and Splitting the Field by beezlebub33
While text-guided image generation is flavour of the month, i don't think that it has broad enough impact to generate a consistent large enough amount of papers to be able to sustain its own conference.
Ulfgardleo t1_irg3wty wrote
Reply to comment by csreid in [D] Giving Up on Staying Up to Date and Splitting the Field by beezlebub33
"This is a fairly new model and I do not know the details. If you seriously consider this, I can read up on the most recent work and then we have a meeting next week and discuss whether and how it could help us".
The awesome thing about solid basics is that you can do exactly this.
Ulfgardleo t1_ir9z7bx wrote
Ulfgardleo t1_ir9xy3t wrote
Reply to comment by Thorusss in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
You seem to be confused.
-
Experiment 1 uses small 5x5 matrices. Not block-matrices. There they only count the number of mults. These are not faster than SIMD implementations of 5x5 matrix mults, otherwise they would have shown it off proudly.
-
Experiment 2 was about 4x4 block-matrices. But here the 10-20% faster than the COMMONLY used algorithms is actually an overstatement of the results. For GPUs, their implementation is only 5% faster than their default jax implementation of Strassen. The difference to TPU could just mean that their Jax compiler sucks for TPUs. (//Edit: by now i low-key assume that the 10-20% refers to standard cBLAS because i do not get 20% compared to strassen for any result in Figure 5 (and how could they, because they never even get more than 20% improvement over cBLAS.))
-
They do not cite any of the papers that are concerned with efficient implementation of strassen. Especially the efficient memory scheme, from 1994. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.6887 it is unclear whether a GPU implementation of that would be faster, since they are not even discussing the GPU implementation of their strassen variant. They do not claim that their algorithm is faster in complexity, so we are completely reliant on that their implementation of strassen makes sense.
Ulfgardleo t1_ir997hv wrote
Reply to comment by Ulfgardleo in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
The worst thing is however that they do not even cite the practically relevant memory efficient implementation of strassen (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.6887 ). One can argue that all matmul algorithms with better complexity than Strassen are irrelevant due to their constants, but not even comparing to the best memory implementation is odd-especially as they don't show improvement in asymptotic complexity.
Ulfgardleo t1_ir95y3t wrote
Reply to comment by master3243 in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
Yeah they are not right. Sota is laser method.
They even missed the huge improvement from 1981...
https://ieeexplore.ieee.org/document/4568320
It is btw all behind the wiki link above.
Ulfgardleo t1_ir7m5md wrote
Reply to comment by mgostIH in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
Is it? I could not see from the paper whether they assume non-commutative multiplication in their small matrix optimization.
//Edit: they do a 4x4 block matrix, but the gains are less than 5% over the existing Strassen algorithm.
Ulfgardleo t1_ir7lytl wrote
Reply to comment by neanderthal_math in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
All Standard unless very large. Atlas is just picking different kernels that "only" change order of operations to maximize CPU utilization.
Ulfgardleo t1_ir7508n wrote
Reply to comment by ReginaldIII in [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
no, because these algorithms are terribly inefficient to implement as SIMD. They have nasty data access patterns and need many more FLOPS when also taking additions into account (just the last steps of adding the elements to the result matrix are more than twice the additions of a standard matmul in the case of the results shown here)
Ulfgardleo t1_ir72pix wrote
Reply to [R] Discovering Faster Matrix Multiplication Algorithms With Reinforcement Learning by EducationalCicada
Why is this a nature paper?
-
Strassen is already known not to be the fastest known algorithms in terms of Floating point multiplications https://en.wikipedia.org/wiki/Computational_complexity_of_matrix_multiplication
-
already strassen is barely used because its implementation is inefficient except in the largest of matrices. Indeed, strassen is often implemented using a standard MatMul as smallest blocks and only used for very large matrices.
-
Measuring the implementation complexity in floating mul is kinda meaningless if you pay for it with a multiple of floating additions. It is a meaningless metric (see 2.)
Ulfgardleo t1_irutjd4 wrote
Reply to [D] What are your thoughts about weak supervision? by ratatouille_artist
A hugely underappreciated fact is the computational difficulty behind learning with weak labels. E.g., if only coarse/group labels are available, multi-class linear classification becomes immediately np-hard.