[R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? Submitted by Chemont t3_109z8om on January 12, 2023 at 1:07 PM in MachineLearning 16 comments 19
icecubeinanicecube t1_j415q2o wrote on January 12, 2023 at 1:38 PM How would you even do that? Once you have run inference through all layers, you can not just randomly pull additional layers out of thin air, can you? Permalink 0
Viewing a single comment thread. View all comments