Viewing a single comment thread. View all comments

GamerMinion t1_jddlqit wrote

Yes, theory is one thing, but you can't build ASICs for everything due to the cost involved.

Did you look into sparsity at latency-equivalent scales? i.e. same latency, bigger but sparser model.

I would be very interested to see results like that, especially for GPU-like accelerators (e.g. Nvidia's AGX computers use their ampere GPU architecture), as latency is a primary focus in high-value computer vision applications such as in autonomous driving.

2