[R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) Submitted by MysteryInc152 t3_11e4w40 on February 28, 2023 at 12:30 PM in MachineLearning 40 comments 330
ReasonablyBadass t1_jae7zhu wrote on February 28, 2023 at 8:16 PM Can't read the paper right now, can someone summarize: is it a new model or "just" the standard transformers but used on multi modal data? if it is new, what are the strucutral changes? Permalink 6
Viewing a single comment thread. View all comments