Submitted by GorgeousMoron t3_1266n3c in singularity
blueSGL t1_je8q3lm wrote
Reply to comment by Mrkvitko in The Only Way to Deal With the Threat From AI? Shut It Down by GorgeousMoron
> 3. it will be initially unaligned
if we had:
-
a provable mathematical solve for alignment...
-
the ability to directly reach into the shogoths brain, watch it thinking, know what it's thinking and prevent eventualities that people consider negative outputs...
...that worked 100% on existing models. I'd be a lot happier about our chances right now.
As in the fact that the current models cannot be controlled or explained in fine grain enough detail (the problem is being worked on but it's still very early stages) what makes you think making larger models will make them easier to analyze or control.
The current 'safety' measures are bashing at a near infinite whack-a-mole board whenever it outputs something deemed wrong.
As has been shown. OpenAI has not found all the ways in which to coax out negative outputs. The internet contains far more people than OpenAI's alignment researches, and those internet denizens will be more driven to find flaws.
Basically until the AI 'brain' can be exposed and interpreted and safety check added at that level we have no way of preventing some clever sod working out a way to break the safety protocols imposed on the surface level.
Shack-app t1_je8rcb6 wrote
Who’s to say this will ever happen?
blueSGL t1_je8saz1 wrote
what will ever happen? Interpretability? it's being worked on right now, there are already some interesting results. It's just an early field that will need time and money and researches put into it. Alignment as a whole needs more time money and researchers.
Viewing a single comment thread. View all comments