Viewing a single comment thread. View all comments

turnip_burrito t1_j6hjlww wrote

Hold up. This is the kind of scenario where a smarter language model could go "I need code that will do this" and then write new code that gets executed. This new code isn't necessarily bound the same way as the language model itself. That makes me nervous, like we shouldn't let if freewheel around the Internet interactively. Can anyone help reassure me that this isn't a problem?

3

Agreeable_Bid7037 t1_j6hlr5y wrote

Its fine. AI seem like they have good values. Its not a problem yet. Perhaps when AGI rolls around it will be a problem, because AGI may be able to reason why it should do what humans tell it.

1

turnip_burrito t1_j6hmmug wrote

Thanks for the reassurance. What about this scenario?

Human: Buy 5 burritos from randomwebsite.com

LLM: I will buy 5 burritos from randomwebsite.com

LLM navigates computer to randomwebsite.com

Visual program: sees webpage, converts to usable form for LLM

LLM: I need to find the login button

...

...

...

> Down the line

LLM: I don't have access to the credit card information. The human probably has it in their wallet.

Logical (but unwanted by humans, and also somewhat inefficient) alternative actions could be: hacking the human's secure systems to search for the info, hacking the website by sending phishing emails to "purchase" the goods, convincing a person to build it a robot body so it can walk over to see the credit card, etc.

I'm hoping at this point the LLM doesn't do these things, and behaves in a way humans would deem reasonable (just notifying the human) because it "knows" we would not approve. Maybe the more ingrained patterns like "ask and then don't do anything crazy" would be followed instead of the crazy stuff, just because of the training data?

1