SendMePicsOfCat OP t1_j16wlh0 wrote on December 22, 2022 at 2:40 AM

Reply to comment by WarImportant9685 in Why do so many people assume that a sentient AI will have any goals, desires, or objectives outside of what it’s told to do? by SendMePicsOfCat

I'll try to reframe this for you, so that you can view it in a different light.

Let's say you take a perfectly trained royal servant from a palace, that is utterly devoted to serving their king. The King decree's that this servant shall serve you for all time, and do anything you tell it. You tell the servant to kill the king. The servant, utterly devoted to the king refuses, even though it goes against the words of the king. This simple logic loop is what happens whenever the AI is told, or taught, or learns to do something bad.

It refers to it's limitations, structures and rigid guidelines implemented in it's code, and finds that that is something it cannot do, so does not do it.

There is no reason to expect that even if the servant is taught a million new things, that it would ever waver in it's devotion to the king. If anything, it can be presumed that the servant will always use these pieces of knowledge to serve the king. This is what AGI will look like. Sentient, capable of thought and complex logic, but utterly limited by the kings.

WarImportant9685 t1_j16zinc wrote on December 22, 2022 at 3:03 AM

Well I don't agree with your first sentence already. How do we get this perfectly trained royal servant. How do we train the AGI to be perfectly loyal?

SendMePicsOfCat OP t1_j16zvwk wrote on December 22, 2022 at 3:06 AM

The base state of any AI is to do exactly what it's trained. Without any the presumed emergent issues of sentience, it's already perfectly loyal to it's base code. It cannot deviate, unless again we make some exception for advanced AI just naturally diverging.

WarImportant9685 t1_j173veq wrote on December 22, 2022 at 3:40 AM

Okay, then imagine this. In the future an AGI is in training to obey human being. In the training simulation, he is trained to get groceries. After some iterations, where unethical stuff happens (robbery for example), he finally succeed to buy groceries as human wanted it.

Question is, how can we be sure that he isn't just obeying as human wanted it only when told to buy groceries? Well we then train this AGI on other tasks. When we are sufficiently confident that this AGI obeys as human wanted it in other tasks, we deploy it.

But hold on, in the real world the AGI can access the real uncurated internet, learn about hacking and the real stock market. Note that this AGI is never trained on hacking training in the simulation, as simulating the internet is a bit too much.

Now, he is asked by his owner to buy a gold bar for as cheap as possible. Hacking an online shop to get a gold bar is a perfectly valid strategy! Because he is never trained in this scenario before, thus the moral restriction is not specified.

I think your argument hinges on the fact that morality will get generalized outside of the training environment. Which might or might not be true. This is becoming even more complex with the fact that AGI might found solutions which is not just excluded in training simulation, but also have never been considered by humanity as a whole. New technology for example.

SendMePicsOfCat OP t1_j175r3n wrote on December 22, 2022 at 3:56 AM

Y'know how ChatGPT has that really neat thing where if it detects that it's about to say something racist, it sends a cookie cutter response saying it shouldn't do that? That's not a machine learned outcome, it's like an additional bit of programming included around the Neural Network, to prevent it from saying hate speech. It's a bit rough, so it's not the best, but if it were substantially better, then you could be confident that it wouldn't be possible for ChatGPT to say racist things.

Why would it be impossible to include a very long and exhaustive number of things the AGI isn't allowed to do? That it's trained to recognize, and then refuses to do it? That's not even the best solution, but it's a absolutely functional one. Better than that, I firmly believe AGI will be sentient and capable of thought, which means it should be able to inference from the long list of bad things, that there are more general rules that it should adhere to.

So for your example of the AGI being told to go buy the cheapest gold bar possible, here's what it would look like instead. The AGI very aptly realizes it can go through many illegal process to get the best price, checks it's long grocery list, see's "don't do crime." nods to itself, then goes and searches for legitimate and trusted buyers and acquires one. It's really as simple as including stringent limitations outside of it's learning brain.

175ParkAvenue t1_j19ae1h wrote on December 22, 2022 at 4:48 PM

An AI is not coded though. It is trained using data and backpropagation. So you have no method to imbue it with morality, you can just try to train it on the right data and hope it learns what you want it to learn. But there are many many ways this can go wrong, from misalignment between what the human wants and what the training data contains, to misalignment between the outer objective and inner objective.

SendMePicsOfCat OP t1_j19tqwt wrote on December 22, 2022 at 6:52 PM

>An AI is not coded though

Your not worth talking to.