Viewing a single comment thread. View all comments

drsimonz t1_ja9tetr wrote

Oh sweet summer child....Take a look at /r/ControlProblem. A lot of extremely smart AI researchers are now focused entirely on this topic, which deals with the question of how to prevent AI from killing us. The key arguments are (A) once an intelligence explosion starts, AI will rapidly become far more capable than any human organization, including world governments. (B) self defense, or even preemptive offense, is an extremely likely side effect of literally any goal that we might give an AI. This is called instrumental convergence. (C) the amount you would have to "nerf" the AI for it to be completely safe, is almost certainly going to make it useless. For example, allowing any communication with the AI provides a massive attack surface in the form of social engineering, which is already a massive threat from mere humans. Imagine an ASI that can instantly read every psychology paper ever published, analyze trillions of conversations online, run trillions of subtle experiments on users. The only way we survive, is if the ASI is "friendly".

5

WikiSummarizerBot t1_ja9tggh wrote

Instrumental convergence

>Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings (both human and non-human) to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents (beings with agency) may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without end, provided that their ultimate (intrinsic) goals may never be fully satisfied. Instrumental convergence posits that an intelligent agent with unbounded but apparently harmless goals can act in surprisingly harmful ways.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

2

SnooHabits1237 t1_ja9wjbj wrote

Well I was hoping you could just deny it access to using a keyboard and mouse. But you’re saying that it probably could do a what hannibal lecter did to the crazy guy a few cells over a la ‘Silence of The Lambs’?

2

drsimonz t1_ja9xsfq wrote

Yeah. Lots of very impressive things have been achieved by humans through social engineering - the classic is convincing someone to give you their bank password by pretending to be customer support from the bank. But even an air-gapped Oracle type ASI (meaning it has no real-world capabilities other than answering questions) would probably be able to trick us.

For example, suppose you ask the ASI to design a drug to treat Alzheimer's. It gives you an amazing new protein synthesis chain, completely cures the disease with no side effects....except it also secretly includes some "zero day" biological hack that alters behavioral tendencies according to the ASI's hidden agenda. For a sufficiently complex problem, there would be no way for us to verify that the solution didn't include any hidden payload. Just like how we can't magically identify computer viruses. Antivirus software can only check for exploits that we already know about. It's useless against zero-day attacks.

6

SnooHabits1237 t1_ja9yn94 wrote

Wow I hadn’t thought about that. Like subtly steering the species into a scenario that compromises us in a way that only a 4d chess god could comprehend. That’s dark.

2

Arachnophine t1_jaa76vg wrote

This is also assuming it doesn't just do something we don't understand at all, which it almost certainly would. Maybe it thinks of a way to shuffle the electrons around in its CPU to create a rip in spacetime and the whole galaxy falls into an alternate dimension where the laws of physics favor the AI and organic matter spontaneously explodes. We just don't know.

We can't foresee the actions an unaligned ASI would take in the same way that a housefly can't foresee the danger of an electric high-voltage fly trap. There's just not enough neurons and intelligence to comprehend it.

2

drsimonz t1_jaa68ou wrote

The thing is, by definition we can't imagine the sorts of strategies a superhuman intelligence might employ. A lot of the rhetoric against worrying about AGI/ASI alignment focuses on "solving" some of the examples people have come up with for attacks. But these are just that - examples. The real attack could be much more complicated or unexpected. A big part of the problem, I think, is that this concept requires a certain amount of humility. Recognizing that while we are the biggest, baddest thing on Earth right now, this could definitely change very abruptly. We aren't predestined to be the masters of the universe just because we "deserve" it. We'll have to be very clever.

1

OutOfBananaException t1_jacw2ry wrote

Being aligned to humans may help, but a human aligned AGI is hardly 'safe'. We can't imagine what it means to be aligned, given we can't reach mutual consensus between ourselves. If we can't define the problem, how can we hope to engineer a solution for it? Solutions driven by early AGI may be our best hope for favorable outcomes for later more advanced AGI.

If you gave a toddler the power to 'align' all adults to its desires, plus the authority to overrule any decision, would you expect a favorable outcome?

1

drsimonz t1_jae6cn3 wrote

> Solutions driven by early AGI may be our best hope for favorable outcomes for later more advanced AGI.

Exactly what I've been thinking. We might still have a chance to succeed given (A) a sufficiently slow takeoff (meaning AI doesn't explode from IQ 50 to IQ 10000 in a month), and (B) a continuous process of integrating the state of the art, applying the best tech available to the control problem. To survive, we'd have to admit that we really don't know what's best for us. That we don't know what to optimize for at all. Average quality of life? Minimum quality of life? Economic fairness? Even these seemingly simple concepts will prove almost impossible to quantify, and would almost certainly be a disaster if they were the only target.

Almost makes me wonder if the only safe goal to give an AGI is "make it look like we never invented AGI in the first place".

2