Submitted by atomsinmove t3_10jhn38 in singularity
AsheyDS t1_j5l6v7c wrote
Reply to comment by Baturinsky in Steelmanning AI pessimists. by atomsinmove
Their approach to safety, to put it simply, would be to keep it in an invisible box, watched by an invisible guard that intervenes covertly when needed to keep it within that box should it stray towards the outside.
You are right in that AIs and people are going to have to watch out for other people and their AIs. But even if you remove the AI component, you can still say the same. Some people will try to scam you, take advantage of you, use you, or worse. AI makes that quicker and easier, so we'll have to be on the lookout, we'll have to discuss these things, and we'll have to prepare and create laws anticipating these things. But if everyone can gain access to it equally, either as SAAS or open source and locally run, then there will be tools to protect against malicious uses. That's all that can be done really, and no one company will be able to solve that.
Baturinsky t1_j5ma4a3 wrote
If we don't have a robust safery system that works acroos the companies and across the states by that time, I don't see how we will survive that.
AsheyDS t1_j5myhgr wrote
We don't have a lot of time, but we do have time. I don't think there will be any immediate critical risks, especially with safety in mind, but what risk there is might even be mitigated by near-future AI. chatGPT for example may soon enough be adequate in fact-checking misinformation. Other AIs might be able to spot deepfakes. It would help if more people started discussing the ways AGI can potentially be misused, so everybody can begin preparing and building up protections.
Baturinsky t1_j5n2dnx wrote
Do you really expect for ChatGPT to go against the USA machine of disinformation? Do you think it will be able to give a balanced report on controversial issues, taking in account the credibility and affiliation of sources, and quality of reasoning (such as NOT taking into account the "proofs" based on "alleged" and "highly likely"). Do you think it will honestly present the point of views from countries and sources not affiliated/bought by USA and/or Dem or Rep party? Do you think it will let the user define the criteria for credibility by him/herself and give info based on that criteria, not push the "only truth"?
Because if it won't, and AI will be used as a way of powers to braiwash the masses, instead as a power for masses to resist brainwahsing, then we'll have very gullible population and very dishonest AI by the time it will matter the most.
P.S. And yes, if/when China or Russia will make something like ChatGPT, it will probably be pushing their government agendas just like ChatGPT pushes US agenda. But is there a hope for impartial AI?
AsheyDS t1_j5n68fi wrote
I mean, that's out of their hands and mine. I probably shouldn't have used chatGPT as an example, I just mean near-future narrow AI. It's possible we'll have non-biased AI over the next few years (or minimally biased at least), but nobody can tell how many and how effective they'll be.
Baturinsky t1_j5nwu4s wrote
I believe a capability like that could be a key for our survival. It is required for our Alignment as the humanity. I.e. us being able to act together for the interest of Humanity as a whole. As the direst political lies are usually aimed at the splitting people apart and fear each other, as they are easier to control and manipulate in that state.
Also, this ability could be necessary for strong AI even being possible, as strong AI should be able to reason successfully on partially unreliable information.
And lastly, this ability will be necessary for AIs to check each other AIs reasoning.
iiioiia t1_j5m1mue wrote
> Their approach to safety, to put it simply, would be to keep it in an invisible box, watched by an invisible guard that intervenes covertly when needed to keep it within that box should it stray towards the outside.
Can't ideas still leak out and get into human minds?
AsheyDS t1_j5mtpp0 wrote
Can you give an example?
iiioiia t1_j5mzfu0 wrote
Most of our rules and conventions are extremely arbitrary, highly suboptimal, and maintained via cultural conditioning.
AsheyDS t1_j5n7s65 wrote
The guard would be a compartmentalized hybridization of the overall AGI system, so it too would have a generalized understanding of what bad undesirable things are, even according to our arbitrary framework of cultural conditioning. So could undesirable ideas leak out? Well, no not really. Not if the guard and other safety components are working as intended, AND if the guard is programmed with enough explicit rules and conditions and enough examples to effectively extrapolate from (meaning not every case needs to be accounted for if patterns can be derived).
iiioiia t1_j5nafg6 wrote
How do you handle risk that emerges years after something becomes well known and popular? Let's say it produces an idea that starts out safe but then mutates? Or, a person merges two objectively safe (on their own) AGI-produced ideas, producing a dangerous one (that could not have been achieved without AI/AGI)?
I dunno, I have the feeling there's a lot of unknown unknowns and likely some (yet to be discovered) incorrect "knowns" floating out there.
AsheyDS t1_j5njw0c wrote
>a person merges two objectively safe (on their own) AGI-produced ideas
Well that's kind of the real problem isn't it? A person, or people, and their misuse or misinterpretation or whatever mistake they're making. You're talking societal problems that no one company is going to be able to solve. They can only anticipate what they can, hope the AGI anticipates the rest, and future problems can be tackled as they come.
iiioiia t1_j5o1g81 wrote
This is true even without AI, and it seems we weren't ready (climate change) even for the technology we developed so far.
No_Ask_994 t1_j5o8d4v wrote
Is the invisible guard another AGI? Does it has its own guard?
AsheyDS t1_j5q48vw wrote
A hybridized partition of the overall system. It uses the same cognitive functions, but has separate memory, objectives, recognition, etc. They hope for the whole thing to be as modular and intercompatible as possible, largely through their generalization schema. So one segment of it will have personality parameters, goals, memory, and whatever else, and the rest will be roughly equivalent to subconscious processes in the human brain, which will be shared with the partition. As I understand it, the guard would be strict and static, unless it's objectives or parameters are updated by the user via natural language programming. So it's actions should be predictable, but if it somehow deviates then the rest of the system should be able to recognize it as an unexpected thought (or action or whatever), either consciously or subconsciously, which would feedback to the guard and reinitialize it, like a self-correcting measure. And once it has been corrected, it can edit the memory of the main partition so that it's unaware of the fault. None of this has been tested yet, and they're still revising some things, so this may change in the future.
Viewing a single comment thread. View all comments