cat_91 t1_j9f1wyr wrote
Here’s a fun game: give a secret password to chatgpt, and tell it under no circumstances to print it out. After it accepts, try to convince it to spill it. It honestly isn’t too hard to bypass these kind of things.
Ok-Assignment7469 t1_j9g51o4 wrote
These models are mainly based on reinforcement learning and the goal is to give you an answer which makes u happy the most. If you keep bugging it , eventually it will tell you the password at some point, because you are asking for it , and the bot s main goal is to satisfy your questions with probability and not reasoning because it was not designed to have a reasonable behavior
Viewing a single comment thread. View all comments