Viewing a single comment thread. View all comments

buzzbuzzimafuzz t1_j7lz92s wrote

A quote from the Verge liveblog:

>This is an important part of the presentation, but I just want to note that Microsoft is having to carefully explain how its new search engine will be prevented from helping to plan school shootings.
>
>"Early red teaming showed that the model could help plan attacks" on things like schools. "We don't want to aid in illegal activity." So the model is used to act as a bad actor to test the model itself.

The safety system proposed sounds interesting but given how simple prompt engineering attacks still work on ChatGPT, I'm not feeling optimistic about how well this will work out in the real world.

23

currentscurrents OP t1_j7m04ot wrote

Meh, I think the safety concerns are overblown. It's really more of bad PR for Microsoft than an actual threat.

You can already find out how to make drugs, build a bomb, etc from the internet. The Anarchist Cookbook has been well-known for decades and you can find a pdf with a simple google search.

39

MrEloi t1_j7mmfgp wrote

... and then the FBI drops by for a chat ...

7

VelveteenAmbush t1_j7pxngk wrote

Yes, 100% agree. This "can we coerce the model into saying something bad" is just a game that journalists play to catastrophize new technology and juice their engagement metrics. There's bad stuff on the internet, too, and you can find it with search engines. We still use search engines because they're incredibly useful.

The embarrassing part is that Google was so afraid of these BS stories that they kept LaMDA stuck in a warehouse for over two years while OpenAI and Microsoft lapped them.

7

[deleted] t1_j7sbhu7 wrote

What we DONT need is a censored chatgpt. Maybe if it had sliders or parental controls like a normal search engine. But there shouldn’t be a universal censorship like what they’re trying to do right now.

3

HatsusenoRin t1_j7mkaf2 wrote

Sir! the bad actor seems to have put itself online and eliminated the good one!

3

PK_thundr t1_j7nz7b2 wrote

Are there any good examples/tutorials/papers about prompt engineering attacks you'd recommend to start with?

1