marvinthedog t1_jadt1wy wrote on February 28, 2023 at 6:42 PM

>There must be some boundary conditions for behaviors which it is not allowed to cross.

That is not what I have heard/remembered from reading about the alignment problem. I don´t see why a super intelligence that is properly aligned to our values would need any boundaries.

Liberty2012 OP t1_jadto73 wrote on February 28, 2023 at 6:46 PM

They are related concepts. Containment is the safety net so to speak. The insurance that alignment remains intact.

For example, high level concept given as a directive "be good to humans". What prevents it from changing that directive?

marvinthedog t1_jadujce wrote on February 28, 2023 at 6:51 PM

>What prevents it from changing that directive?

Its terminal goal (utility function), if it changes its terminal goal it wont achieve its terminal goal so that is a very bad strategy for the asi.

Liberty2012 OP t1_jadwbcx wrote on February 28, 2023 at 7:02 PM

Humans have agency to change their own alignment which places themselves in contradictory and hypocritical positions.

Sometimes this is due to the nature of our understanding changes. We have no idea how the AI would perceive the world. We may give it an initial alignment of "be good to humans". What if it later comes to an understanding that directive is invalid because humans are either "bad" or irrelevant. Therefore a hard mechanism in place to ensure retained alignment.

marvinthedog t1_jadxbb1 wrote on February 28, 2023 at 7:09 PM

I don´t think we humans have terminal goals (by its true definition) and, in that case, that is what separates us from the asi.

Liberty2012 OP t1_jae3380 wrote on February 28, 2023 at 7:45 PM

The closest would be our genetic encoding of behaviors or possibly other limits of our biology. However we attempt to transcend those limits as well with technological augmentation.

If ASI has agency and self reflection, then can the concept of an unmodifiable terminal goal even exist?

Essentially, we would have to build the machine with a built in blind spot of cognitive dissonance that it can not consider some aspects of its own existence.

marvinthedog t1_jaeijgl wrote on February 28, 2023 at 9:24 PM

>If ASI has agency and self reflection, then can the concept of an unmodifiable terminal goal even exist?

Why not?

>Essentially, we would have to build the machine with a built in blind spot of cognitive dissonance that it can not consider some aspects of its own existence.

Why?

If its terminal goal is to fill the universe with paper clips it might know about all other things in existance but why would it care other than if that knowledge helped it to fill the universe with paper clips?

Liberty2012 OP t1_jael9bs wrote on February 28, 2023 at 9:42 PM

Because a terminal goal is just a concept we made up. It is just the premise for a proposed theory. It is essentially why the whole containment idea is of such complex concern.

If a terminal goal was a construct that already existed in the context of a sentient AI, then it is already a partially solved problem. Yes, you could still have the paperclip scenario, but it would be just a matter of having the right combination of goals. We don't really know how to prevent the AI from changing those goals, it is a concept only.

Surur t1_jaenmas wrote on February 28, 2023 at 9:57 PM

I believe the idea is that every action the AI takes would be to further its goal, which means the goal will automatically be preserved, but of course in reality every action the AI takes is to increase its reward, and one way to do that is to overwrite its terminal goal with an easier one.

Is the intelligence paradox resolvable?

marvinthedog t1_jadplr2 wrote on February 28, 2023 at 6:20 PM

Liberty2012 OP t1_jadq1t1 wrote on February 28, 2023 at 6:23 PM