Viewing a single comment thread. View all comments

FuturologyBot t1_ixupqhu wrote

The following submission statement was provided by /u/Soupjoe5:


Article:

1

Online videos are a vast and untapped source of training data—and OpenAI says it has a new way to use it.

OpenAI has built the best Minecraft-playing bot yet by making it watch 70,000 hours of video of people playing the popular computer game. It showcases a powerful new technique that could be used to train machines to carry out a wide range of tasks by binging on sites like YouTube, a vast and untapped source of training data.

The Minecraft AI learned to perform complicated sequences of keyboard and mouse clicks to complete tasks in the game, such as chopping down trees and crafting tools. It’s the first bot that can craft so-called diamond tools, a task that typically takes good human players 20 minutes of high-speed clicking—or around 24,000 actions.

The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained how to perform tasks by watching humans do them. Imitation learning can be used to train AI to control robot arms, drive cars or navigate webpages.

There is a vast amount of video online showing people doing different tasks. By tapping into this resource, the researchers hope to do for imitation learning what GPT-3 did for large language models. “In the last few years we’ve seen the rise of this GPT-3 paradigm where we see amazing capabilities come from big models trained on enormous swathes of the internet,” says Bowen Baker at OpenAI, one of the team behind the new Minecraft bot. “A large part of that is because we’re modeling what humans do when they go online.”

The problem with existing approaches to imitation learning is that video demonstrations need to be labeled at each step: doing this action makes this happen, doing that action makes that happen, and so on. Annotating by hand in this way is a lot of work, and so such datasets tend to be small. Baker and his colleagues wanted to find a way to turn the millions of videos that are available online into a new dataset.

The team’s approach, called Video Pre-Training (VPT), gets around the bottleneck in imitation learning by training another neural network to label videos automatically. They first hired crowdworkers to play Minecraft, and recorded their keyboard and mouse clicks alongside the video from their screens. This gave the researchers 2000 hours of annotated Minecraft play, which they used to train a model to match actions to onscreen outcome. Clicking a mouse button in a certain situation makes the character swing its axe, for example.

The next step was to use this model to generate action labels for 70,000 hours of unlabelled video taken from the internet and then train the Minecraft bot on this larger dataset.


Please reply to OP's comment here: https://old.reddit.com/r/Futurology/comments/z58i6a/a_bot_that_watched_70000_hours_of_minecraft_could/ixulnyj/

1