Mortal-Region t1_j3i1gg7 wrote on January 8, 2023 at 6:52 PM

This is roughly the idea behind reinforcement learning, which is a means of training so-called intelligent agents, which are AIs that interact with their (usually simulated) environments. It's basically a carrot-and-stick approach -- actions that lead to good outcomes are reinforced so that the agent is more likely to take the same kind of action in the future, while actions that lead to bad outcomes are treated in the opposite way.

Doing this well means maintaining a balance between "exploration" and "exploitation". Imagine an enormous room filled with billions of slot machines that pay out at different rates. "Exploration" consists of wandering around the room trying out different machines to see how well they pay. "Exploitation" means playing the best machine you've found so far over and over again.

AndromedaAnimated t1_j3iyaxh wrote on January 8, 2023 at 10:11 PM

What we still need to add to that is the transfer towards a non-simulated environment and a „metronome“ for automatic „ask/search/move“ prompting.

Mortal-Region t1_j3j1if6 wrote on January 8, 2023 at 10:31 PM

Yeah, that's the cutting edge in robotics right now -- training robots in simulation then deploying them in the real world. Still a lot of work to do on that.

AndromedaAnimated t1_j3j3dk1 wrote on January 8, 2023 at 10:43 PM

Absolutely agree. Thank you for the great link!