Viewing a single comment thread. View all comments

visarga t1_j46b2po wrote on January 13, 2023 at 1:50 PM

Reply to comment by Chemont in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont

No but if you use a decoder model (autoregressive) you can generate more tokens for the same task, depending on its difficulty. Chain-of-thought makes use of this trick.