Submitted by _Arsenie_Boca_ t3_10vwm8k in MachineLearning
PassingTumbleweed t1_j7mlwls wrote
Reply to comment by _Arsenie_Boca_ in [D] Papers that inject embeddings into LMs by _Arsenie_Boca_
I'm not aware of any comparison. Maybe it doesn't matter that much?
PaLI feeds embeddings from the Vision Transformer to the LM after a linear projection layer. It allows back propagation through ViTs weights so that the image encoding can be learned for the task. The ability to tune the embeddings in end-to-end fashion might be an important consideration.
_Arsenie_Boca_ OP t1_j7ommq8 wrote
Yes, seamless joint training is definitely one of the perks. I will look further if I can find anything about the effectiveness of different injection/fusion mechanisms.
Viewing a single comment thread. View all comments