That_Violinist_18

That_Violinist_18 t1_j8ed3j9 wrote on February 13, 2023 at 6:15 PM

Reply to comment by currentscurrents in The Inference Cost Of Search Disruption – Large Language Model Cost Analysis [D] by norcalnatv

So should we expect much higher peak throughput numbers from more specialized hardware?

I have yet to hear of any startups in the ML hardware space advertising this.

That_Violinist_18 t1_j88ilse wrote on February 12, 2023 at 1:12 PM

Reply to comment by currentscurrents in The Inference Cost Of Search Disruption – Large Language Model Cost Analysis [D] by norcalnatv

I keep hearing this argument, but I also keep hearing that models are hitting 60%+ of peak throughput for GPUs when optimizations like FlashAttention and other things are considered.

So how much room is there for alternative architectures when the current hardware only leaves at most 40% of its peak performance on the table?

That_Violinist_18

That_Violinist_18 t1_j8ed3j9 wrote on February 13, 2023 at 6:15 PM

That_Violinist_18 t1_j88ilse wrote on February 12, 2023 at 1:12 PM

[N] "I got access to Google LaMDA, the Chatbot that was so realistic that one Google engineer thought it was conscious. First impressions"

[R] MIT releases all slides for efficient ML course