wyrdwulf t1_jdikhuo wrote on March 24, 2023 at 5:21 PM

Reply to comment by BullockHouse in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

They had another model do that already.

OpenAI: We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play

BullockHouse t1_jdil2ok wrote on March 24, 2023 at 5:25 PM

I'm familiar! I'm curious though if it can generalize well enough to play semi-competently without specialized training. Has implications for multi-modal models and robotics.