Viewing a single comment thread. View all comments

Raphaelll_ t1_j45u38j wrote 2 years ago

Reply to comment by PassingTumbleweed in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont

Did this ever get any traction?

PassingTumbleweed t1_j46sco1 wrote 2 years ago

That depends on what you mean. I don't think any of the LLMs use it, but it has some citations and follow-up literature.