Viewing a single comment thread. View all comments

Mortal-Region t1_j3i1gg7 wrote

This is roughly the idea behind reinforcement learning, which is a means of training so-called intelligent agents, which are AIs that interact with their (usually simulated) environments. It's basically a carrot-and-stick approach -- actions that lead to good outcomes are reinforced so that the agent is more likely to take the same kind of action in the future, while actions that lead to bad outcomes are treated in the opposite way.

Doing this well means maintaining a balance between "exploration" and "exploitation". Imagine an enormous room filled with billions of slot machines that pay out at different rates. "Exploration" consists of wandering around the room trying out different machines to see how well they pay. "Exploitation" means playing the best machine you've found so far over and over again.

5

AndromedaAnimated t1_j3iyaxh wrote

What we still need to add to that is the transfer towards a non-simulated environment and a „metronome“ for automatic „ask/search/move“ prompting.

2