That_Violinist_18

That_Violinist_18 t1_j88ilse wrote

I keep hearing this argument, but I also keep hearing that models are hitting 60%+ of peak throughput for GPUs when optimizations like FlashAttention and other things are considered.

So how much room is there for alternative architectures when the current hardware only leaves at most 40% of its peak performance on the table?

4