Submitted by Liberty2012 t3_11ee7dt in singularity
marvinthedog t1_jadplr2 wrote
I don´t think the strategy is to cage it but to align it correctly with our values, which probably is extremely, extremely, extremely difficult.
Liberty2012 OP t1_jadq1t1 wrote
Well, cage is simply metaphorical context. There must be some boundary conditions for behaviors which it is not allowed to cross.
Edit: I explain alignment in further detail in the original article. Mods removed it from original post, but hopefully it is ok to link in a comment. It was a bit much to put all in a post, but there was a lot of thought exploration on the topic.
https://dakara.substack.com/p/ai-singularity-the-hubris-trap
marvinthedog t1_jadt1wy wrote
>There must be some boundary conditions for behaviors which it is not allowed to cross.
That is not what I have heard/remembered from reading about the alignment problem. I don´t see why a super intelligence that is properly aligned to our values would need any boundaries.
Liberty2012 OP t1_jadto73 wrote
They are related concepts. Containment is the safety net so to speak. The insurance that alignment remains intact.
For example, high level concept given as a directive "be good to humans". What prevents it from changing that directive?
marvinthedog t1_jadujce wrote
>What prevents it from changing that directive?
Its terminal goal (utility function), if it changes its terminal goal it wont achieve its terminal goal so that is a very bad strategy for the asi.
Liberty2012 OP t1_jadwbcx wrote
Humans have agency to change their own alignment which places themselves in contradictory and hypocritical positions.
Sometimes this is due to the nature of our understanding changes. We have no idea how the AI would perceive the world. We may give it an initial alignment of "be good to humans". What if it later comes to an understanding that directive is invalid because humans are either "bad" or irrelevant. Therefore a hard mechanism in place to ensure retained alignment.
marvinthedog t1_jadxbb1 wrote
I don´t think we humans have terminal goals (by its true definition) and, in that case, that is what separates us from the asi.
Liberty2012 OP t1_jae3380 wrote
The closest would be our genetic encoding of behaviors or possibly other limits of our biology. However we attempt to transcend those limits as well with technological augmentation.
If ASI has agency and self reflection, then can the concept of an unmodifiable terminal goal even exist?
Essentially, we would have to build the machine with a built in blind spot of cognitive dissonance that it can not consider some aspects of its own existence.
marvinthedog t1_jaeijgl wrote
>If ASI has agency and self reflection, then can the concept of an unmodifiable terminal goal even exist?
Why not?
>Essentially, we would have to build the machine with a built in blind spot of cognitive dissonance that it can not consider some aspects of its own existence.
Why?
If its terminal goal is to fill the universe with paper clips it might know about all other things in existance but why would it care other than if that knowledge helped it to fill the universe with paper clips?
Liberty2012 OP t1_jael9bs wrote
Because a terminal goal is just a concept we made up. It is just the premise for a proposed theory. It is essentially why the whole containment idea is of such complex concern.
If a terminal goal was a construct that already existed in the context of a sentient AI, then it is already a partially solved problem. Yes, you could still have the paperclip scenario, but it would be just a matter of having the right combination of goals. We don't really know how to prevent the AI from changing those goals, it is a concept only.
Surur t1_jaenmas wrote
I believe the idea is that every action the AI takes would be to further its goal, which means the goal will automatically be preserved, but of course in reality every action the AI takes is to increase its reward, and one way to do that is to overwrite its terminal goal with an easier one.
Viewing a single comment thread. View all comments