PassingTumbleweed t1_j7mlwls wrote on February 7, 2023 at 10:01 PM

Reply to comment by _Arsenie_Boca_ in [D] Papers that inject embeddings into LMs by _Arsenie_Boca_

I'm not aware of any comparison. Maybe it doesn't matter that much?

PaLI feeds embeddings from the Vision Transformer to the LM after a linear projection layer. It allows back propagation through ViTs weights so that the image encoding can be learned for the task. The ability to tune the embeddings in end-to-end fashion might be an important consideration.

_Arsenie_Boca_ OP t1_j7ommq8 wrote on February 8, 2023 at 8:17 AM

Yes, seamless joint training is definitely one of the perks. I will look further if I can find anything about the effectiveness of different injection/fusion mechanisms.