foreheadteeth t1_irbhdfb wrote
I'm a mathematician, one of my areas of research is linear algebra. To multiply 2x2 matrices, you do this:
[a b][e f] [ae+bg af+bh]
[c d][g h] = [ce+dg cf+dh]
In the expression above, there are R=8 products. You can apply the expression above recursively to compute the product of N by N matrices "blockwise", where N = 2^n , but then the overall running time is O(N^3 ), which is the same as naive matrix multiplication. That being said, recursive blockwise multiplication is much faster than a more naive implementation on real hardware, for reasons of cache coherence (as much as a factor of 10 or more).
Strassen was able multiply 2x2 matrices with only R=7 products. If you apply Strassen's algorithm recursively to matrices of size N = 2^n , then the running time is O(N^{2.8074}), here 2.8074 is really log_2 (7). I think nowadays, it is considered that these algorithms become useful when N>1,000 or so.
I think everyone is pretty much convinced that for 2x2 matrices, R = 7 is best possible. So then people tried k by k matrices, with k=3,4,etc..., with the hope of using much less multiplications than the naive algorithm, which is R = k^3 .
If you can find a matrix multiplication algorithm for small k by k matrices that takes R < k^3 multiplications, then you can use it recursively to find ever more efficient block matrix multiplication algorithms for size N = k^n , but of course if k is large, then N becomes insanely large.
For some large values of k, people have found algorithms with very small R < k^3 , and as a result, at least theoretically, we know how to multiply matrices in O(N^{2.3728596} ) time. However, this is purely theoretical, since the corresponding value of k is large, so N is astronomical.
In the present paper, the authors have investigated matrix multiplication algorithms for multiplying rectangular matrices A and B, with the following sizes:
A is p by q, and B is q by r, with p,q,r ∈ { 2,3,4,5 }.
In most cases, the best value of R that they found is what was already in the literature.
In the cases (p,q,r) ∈ { (3,4,5), (4,4,5), (4,5,5) } , this paper improved the value of R.
There are some further improved values of R if (p,q,r) ∈ { (4,4,4), (5,5,5) } but only in the case of mod-2 arithmetic.
Although the authors did not improve R in most cases, they claim they also made improvements of a different kind, which is easiest to explain by comparing Strassen and Winograd's algorithms (link is above).
As previously mentioned, Strassen's algorithm multiplies 2x2 matrices with R = 7 products, but it uses 18 additions or subtractions. Winograd's algorithm also multiplies 2x2 matrices with R=7 products, but uses a mere 15 additions and subtractions. Thus, for finite values of N, the recursive version of Winograd is slightly faster than recursive Strassen.
In the present paper, the authors also claim to have found new algorithms that require fewer additions and subtractions, even if the value of R is not improved compared to what was already known. As a result, recursive versions of these new algorithms are slightly faster for finite values of N, even of the big-O performance is not improved.
Aloekine t1_ircx86k wrote
Thanks, this was a helpful post. If I could ask a question,
Leaving aside the point about this being discovered with DRL (which is obviously astounding and cool), I’m trying to get a better sense of how widely applicable these new improvements found are. There’s another poster in the thread who’s much more pessimistic about this being particularly applicable or the benchmarks being particularly relevant, what’s your perspective?
foreheadteeth t1_irdomjk wrote
The matrix multiplication exponent currently sits at ω=2.3728596 and I don't think that's been changed with this paper. I don't think that any of the papers improving ω were in Nature, even though those are quite important papers, so I would advise people working in this field to send it to Nature in the future. That being said, I suspect Nature is not genuinely interested in improvements to ω; I suspect that the attraction for Nature is mainly in the "discovered with DRL", as you say.
In Fig 5, they claim speedups compared to standard implementations for matrix size N>8192, up to 23.9%. Also, they are faster than Strassen by 1-4%. That's not bad!
They also mentioned a curious application of their DRL, to find fast multiplication algorithms for certain classes of matrices (e.g. skew matrices). I don't think they've touched ω in that case either, but if their algorithms are genuinely fast at practical scales, that might be interesting, but again I didn't notice any benchmarks for reasonable matrix sizes.
(Edit: I had failed to notice the benchmark in Fig 5).
Viewing a single comment thread. View all comments