Viewing a single comment thread. View all comments

Jeffy29 t1_jeg33to wrote

That's one of the most boring names I've have seen a paper have lol, though I skimmed it and it looks quite good and is surprisingly readable. Though I don't think this method will be adopted anytime soon, from the description it sounds quite heavy on inference and given how much compute is needed for current (and rapidly growing) demand, you don't want to add to it when you can just train a better model.

The current field really reminds me of the early semiconductor era, everyone knew that there were lots of gains to be had by making transistors in a smarter way but there wasn't the need when node shrinking was going so rapidly and gains were, it wasn't until the late 2000s and 2010s when the industry really started chasing those gains which there are plenty but it isn't nearly as cheap or fast as the good ol' days of transistor shrinking. But it is good to know that even if LLM performance gains inexplicably completely stops tomorrow, we still have lots of methods (like this and others) to improve their performance.

2