currentscurrents t1_jaetyg1 wrote
Reply to comment by dancingnightly in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
Can't the reward model be discarded at inference time? I thought it was only used for fine-tuning.
Viewing a single comment thread. View all comments