Ouitos t1_j50cm0i wrote on January 19, 2023 at 2:57 PM

Hi, thanks for the explanation !

Two comments :

> 1. Make "New probs" equal to "Initial probs" to initialize.

Shouldn't it be the opposite ? Make the initial be equal to the first occurence of new probs ? I mean equality is transitive, but here we think you change new probs to be equal to initial probs, but I contradicts the diagram that says that new probs is always the output of our LM.

> loss = min(ratio * R, clip(ratio, 0.8, 1.2) * R)

Isn't the min operation redundant with the clip ? How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?

JClub OP t1_j51h8up wrote on January 19, 2023 at 7:08 PM

> Shouldn't it be the opposite ?

Yes, that makes more sense. Will change!

> How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?

Maybe I did not explain properly what the clip is doing. If you have ratio=0.6, then it become 0.8 and if it is > 1.2, it becomes 1.2
Does that make more sense? Regarding the min operation, it's just an heuristic to choose the smaller update tbh

Ouitos t1_j54nh7v wrote on January 20, 2023 at 10:56 AM

Yes, but If you have a ratio of 0.6, you then take the min of 0.6 * R and 0.8 * R, which is ratio * R. In the end, the clip is only effective one way, and the 0.8 lower limit is never used. Or maybe R has a particular property that makes this not as straight forward ?

JClub OP t1_j57rrn6 wrote on January 21, 2023 at 12:09 AM

ah yes, you're right. I actually don't know why, but you can check the implementation and ask it on GitHub