tdgros t1_j41fn3f wrote
Reply to comment by rehrev in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont
At train time, you plug decoders at many levels with the same objective, you can find out if some things can be decoded earlier, using an additional network that outputs a sort of confidence. At inference time, you run the layers one by one, and stop when the confidence is high. which allows you to skip some computations. (It's probably a simplistic description, feel free to correct me)
[deleted] t1_j42hduk wrote
[removed]
Viewing a single comment thread. View all comments