Submitted by Dramatic-Economy3399 t3_106oj5l in singularity
Mortal-Region t1_j3i1gg7 wrote
This is roughly the idea behind reinforcement learning, which is a means of training so-called intelligent agents, which are AIs that interact with their (usually simulated) environments. It's basically a carrot-and-stick approach -- actions that lead to good outcomes are reinforced so that the agent is more likely to take the same kind of action in the future, while actions that lead to bad outcomes are treated in the opposite way.
Doing this well means maintaining a balance between "exploration" and "exploitation". Imagine an enormous room filled with billions of slot machines that pay out at different rates. "Exploration" consists of wandering around the room trying out different machines to see how well they pay. "Exploitation" means playing the best machine you've found so far over and over again.
AndromedaAnimated t1_j3iyaxh wrote
What we still need to add to that is the transfer towards a non-simulated environment and a „metronome“ for automatic „ask/search/move“ prompting.
Mortal-Region t1_j3j1if6 wrote
Yeah, that's the cutting edge in robotics right now -- training robots in simulation then deploying them in the real world. Still a lot of work to do on that.
AndromedaAnimated t1_j3j3dk1 wrote
Absolutely agree. Thank you for the great link!
Viewing a single comment thread. View all comments