Submitted by scarynut t3_10ijzi2 in MachineLearning

So LLMs like GPT3 have understandably raised concerns about the disruptiveness of faked texts, faked images and video, faked speech and so on. While this may likely change soon, as of now OpenAI controls the most accessible and competent LLM. And OpenAIs agenda is said in their own words to be to benefit mankind.

If so, wouldn't it make sense to add a sort of watermark to the output? A watermark built into the model parameters so that it could not easily be removed, but still detectable with some key or some other model. While it may not matter in the long run, it would set a precedent to further development and demonstrate some kind of responsibility for the disruptive nature of LLMs/GPTs.

Would it not be technically possible, nä would it make sense?

150

Comments

You must log in or register to comment.

adt t1_j5erdiz wrote

Already in the works (Scott Aaronson is a scientist with OpenAI):

>>we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT.
>Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.
https://scottaaronson.blog/?p=6823

177

Appropriate_Ant_4629 t1_j5gb1kw wrote

Stable Diffusion already includes one by default:

In particular it uses

Of course with open source software and models, you'd be free to create a fork that doesn't include one, or uses a different one.

30

ThisIsNotAnAlias t1_j5gjblv wrote

Last I checked image watermarks were super weak against rotations, seems to still be the case - but the better methods could cope with cropping way better than these.

9

Appropriate_Ant_4629 t1_j5gy6e3 wrote

> Last I checked image watermarks were super weak against rotations

Obviously depends on the technique. The old-school popular technique of "slap a signature in the painting" like Dürer's stylized A/D logo is very robust to rotations, but not robust to cropping from the bottom in that case.

> seems to still be the case - but the better methods could cope with cropping way better than these.

It's near impossible to have a watermark technology that's robust to all transformations, at least if you reveal what watermark algorithm you used.

One easy attack that works on most some techniques, would be to just re-encode the content, but writing your own watermark over the original using the same watermarking algorithm.

7

Freonr2 t1_j5hx9s5 wrote

Yeah but its trivial to remove that when you run the source yourself.

2

marr75 t1_j5joben wrote

Watermarks are a great way to ensure I use GPT-NeoX and allies instead of Da Vinci and allies.

1

eigenman t1_j5fgmc8 wrote

Ahh of course I should have checked Scott's blog first.

7

Fabulous-Possible758 t1_j5gtzlk wrote

On phone so can’t read the blog yet: does it say how well it handles false positives? ie, flagging stuff not written by GPT as being written by GPT?

I could see a really shitty world coming about where the filter is effectively useless because everyone will need to have to make sure their content will pass the watermark detector.

3

franciscrot t1_j5g8uq1 wrote

Would anyone like to explain to me like I'm five how it can be robust against edits like that?

2

careless25 t1_j5gu7tn wrote

Very simple explanation

Give each word a unique number. Add all the numbers up. That's your unique identifier for the gpt output

If you switched some words around the sum won't change very much.

8

BitterAd9531 t1_j5erse4 wrote

Won't work in the long term. OpenAI might have been the first one to release, but we know other companies have better LLMs and others will catch up soon. When that happens, models without watermarks will be released and people who want output without a watermark will use that model.

And even if you somehow force all of them to implement a watermark, it would be trivial to combine outputs of different models to circumvent it. Not to mention that slight rewrites by a human would probably break most watermarks, the same way they break the current GPT detectors.

158

[deleted] t1_j5f1x9y wrote

[removed]

−38

BitterAd9531 t1_j5f2nk5 wrote

I know about OpenAI's research into watermarking. It doesn't contradict anything I said. It's only a matter of time before more models appear and the researchers themselves talk about how it's defeatable by both humans and other models through combinations and rewriting.

30

[deleted] t1_j5f3k9f wrote

[removed]

−18

BitterAd9531 t1_j5f5olr wrote

I think you are misunderstanding how these watermarks work. The watermark is encoded in the tokens used and so combining or rewriting will weaken the watermark to the point it can no longer be used to accurately detect. Robust means a few tokens may be changed, but changing enough tokens will have an impact eventually.

The semantics don't change because in language, there are multiple ways to describe the same thing without using the same (order of) words. That's literally what "rewriting" means.

21

[deleted] t1_j5f9weh wrote

[removed]

−17

BitterAd9531 t1_j5fcby3 wrote

>If you think, you can take two watermarked LLMs and 'trivially" combine their output as you stated, explain in detail how you do that in an automated way.

No thank you, I'm not going to write an LLM from scratch for a Reddit argument. And FWIW, I suspect that even if I did, you'd find some way to convince yourself that you're not wrong. You not understanding how this works doesn't impact me nearly enough to care that much. Have a good one.

18

[deleted] t1_j5fdv9z wrote

[removed]

−16

hey_look_its_shiny t1_j5fnyt6 wrote

I'm not OP, but the words "won't work in the long term" from their original statement are not synonymous with "useless".

Your original comment was disrespectful, and while you have raised some valid points along the way, they're collectively misaligned with the original statement you were responding to. You've been fighting a strawman, and it shows in how the community received your comments.

16

[deleted] t1_j5fqkim wrote

[removed]

−2

its_ya_boi_Santa t1_j5gdo2x wrote

Wrong. I didn't read anything past your 2nd wall of text on this post, but maybe you should use a LLM to argue with people online, it looks like it could save you hours of time and effort.

6

[deleted] t1_j5gdsob wrote

[removed]

−2

its_ya_boi_Santa t1_j5geex3 wrote

Wrong. Mentions LLM.

5

[deleted] t1_j5gf9jo wrote

[removed]

−2

its_ya_boi_Santa t1_j5gh0s1 wrote

Wrong.

4

hellrail t1_j5x5u8o wrote

And what exactly is wrong about the statement?

1

its_ya_boi_Santa t1_j5xqz3k wrote

The guy was just being obnoxious hence all the deleted comments and starting his replies with "wrong." before writing huge walls of text

1

hellrail t1_j5xsq46 wrote

Ah, so there was nothing wrong in my statement but you just wanted to be obnoxious?

Good that you admit it.

PS: im the guy haha

1

its_ya_boi_Santa t1_j5xworg wrote

I have no idea what you wrote it didn't bother me enough to remember it all this time later, if you made a new account just to come back to your old arguments that's really sad dude, I hope you can better yourself and have a good life

1

hellrail t1_j5xwq72 wrote

Wrong. Its the very same account.

And in your previous answer, when u thought i was sb different, you already explained why you did it, now you claim to not remember. Hahaha. You are contradictory and nonsense as usual.

1

its_ya_boi_Santa t1_j5xwuc9 wrote

The sentiment still stands, I hope you get out of this rut your in. "This too shall pass" as they say.

1

hellrail t1_j5y0ok5 wrote

Wrong, I am not in any rut.

New accounts, being in a rut, saying wrong just for the sake of saying something eventhough nothing was wrong....

If i look at your behaviour it clearly shows that you are fighting your own inner demons instead of really replying to what somebody has said (otherwise u wouldnt put so much of self-fantasized allegations in your posting).

I hope this kind of self-therapy works out for you, but i doubt it helps with anything.

1

hey_look_its_shiny t1_j5fu53q wrote

You don't need to implement a full-scale LLM in order to degrade watermarks at scale or even mix-and-match watermarked inputs. People who aren't even trying get halfway there now with crappy synonym engines.

And before you ask, no, I'm not going to technically spec it for you. Instead I suggest using the upvote pattern from this expert community to run backprop on your beliefs. ;)

5

[deleted] t1_j5fv06l wrote

[removed]

−2

BitterAd9531 t1_j5g52os wrote

>Besides that, OP stated that he wants to use a llm for this, not me.

Actually I didn't. If you read my comment you'd understand I would need the LLM to demonstrate the model that does the actual combining (which obviously wouldn't be an LLM). Seeing as there are currently no models that have watermarking, I'd have to write one myself to test the actual model that does the combining to circumvent the watermark. Either you didn't understand this, or you're once again taking single sentences out of context and making semi-valid points that don't have any relevancy to the orignal discussion.

But honestly I feel like this is completely besides the point. I've given you a high-level explanation of how these watermarks can be defeated and you seem to be the only one who does not understand how they work.

4

hey_look_its_shiny t1_j5htrp4 wrote

> Besides that, OP stated that he wants to use a llm for this, not me.

Actually, you introduced that concept first when you said:

> If u want some AI to alter the text for you, you again need a LLM.

OP had not mentioned applying an LLM to the case prior to that. It was explicit in their original comment, and implicit in all comments thereafter, that a watermark-free LLM was only one of the ways in which this problem could be tackled.

Meanwhile:

> Synonym engines wouldnt change an n-gram watermarks significantly enough as a synonym is the same type of word so there are token patterns persisting.

Right. Hence why I said they "get halfway there". Halfway is clearly not "all the way", and thus not "significantly enough".

And finally:

> Rules for r/MachineLearning > 1. Be nice: no offensive behavior, insults or attacks

In light of your recent description of an interlocutor's "limited capacity brain", you seem to be catastrophically failing at (1) understanding the problem space being discussed, (2) understanding the deficiencies in your own arguments, and (3) understanding basic norms and rules of interpersonal decency....

Just my two cents, but this forum probably isn't the right space for you until you level up a bit.

2

Historical-Coat5318 t1_j5f88m7 wrote

It seems to me ethically imperative to be able to discern human text from AI text, so it's really concerning when people just hand-wave it away immediately as obviously futile, like Altman did in a recent interview. Obviously these detection methods would have to be more robust than just a cryptographic key that can be easily circumvented just by changing a few words, but this is the most pressing ethical issue in AI safety today and no one seems to be even considering dealing with it in a serious way.

One idea: Couldn't you just train the AI to identify minor changes to the text to the point where rewriting it would be too much of a hassle? Also, open the server history under a homonymous (for privacy concerns) database so that everyone has access to all GPT (and all other LLMs) output and couple that with the cryptographic key Scott Aaronson introduced plus adversarial solutions for re-worded text. This with other additional safety features would make it too much of hassle for anyone to try to bypass it, maybe an additional infinitesimal cost to every GPT output to counteract spam, etc etc. A lot of regulation is needed for something so potentially disruptive.

−39

BitterAd9531 t1_j5fal5s wrote

>no one seems to be even considering dealing with it in a serious way

Everyone has considered dealing with it, but everyone who understands the technology behind them also knows that it's futile in the long term. The whole point of these LLMs it to mimic human writing as closely as possible and the more they succeed, the more difficult it becomes to detect. They can be used to output both more precise and more variated text.

Countermeasures like watermarks will be trivial to circumvent while at the same time restricting the capabilities and performance of these models. And that's ignoring the elephant in the room, which is that once open-source models come out, it won't matter at all.

>this is the most pressing ethical issue in AI safety today

Why? It's been long known that the difference between AI and human capabilities will diminish over time. This is simply the direction we're going. Maybe it's time to adapt instead of trying to fight something inevitable. Fighting technological progress has never worked before.

People banking on being able to distinguish between AI and humans will be in for a bad time the coming few years.

42

Historical-Coat5318 t1_j5fbhj5 wrote

If by fighting technological progress you mean controlling it to make sure it serves humanity in the safest most optimal way then yes, we've been doing this forever, when cars were first introduced traffic police didn't exist. There is nothing retrograde or luddite in thinking this way, it's what we've always done.

Obviously watermarking is futile but there are other methods that need to be considered which no one even entertains, for example the ones I mentioned in my first comment.

Also it should be trivially obvious that AI should never be open-source. That's the worst possible idea.

−5

TonyTalksBackPodcast t1_j5iblmx wrote

I think the worst possible idea is allowing a single person or handful of people to have near-total control over the future of AI, which will be the future of humanity. The process should be democratized as much as can be. Open source is one way to accomplish that, though it brings its own dangers as well

11

KvanteKat t1_j5j3ewk wrote

>I think the worst possible idea is allowing a single person or handful of people to have near-total control over the future of AI

I'm not sure regulation is the biggest threat to the field of AI being open. We already live in a world where a small handful of people (i.e. decision makers at Alphabet, OpenAI, etc.) have an outsized influence on the development of the field because training large models is so capital-intensive that very few organizations can really compete with them (researches at universities sure as hell can't). Neither compute (on the scale necessary to train a state-of-the-art model) or well-curated large training datasets are cheap.

Since it is in the business interest of incumbents in this space to minimize competition (nobody likes to be disrupted), and since incumbents in this space already have an outsized influence, some degree of regulation to keep them in check may well be beneficial rather than detrimental to the development of AI and derived technologies and their integration into wider society (at least I believe so, although I'm open to other perspectives in this matter).

2

Historical-Coat5318 t1_j5jw8o8 wrote

I just can't even begin to comprehend this view. Of course, democratizing something sounds good, but if AI has mass-destructive potential it is obviously safer if a handful of people have that power than if eight billion have it. Even if AI isn't mass-destructive, which it obviously isn't yet, it is already extremely socially disruptive and if any given person has that power our governing bodies have basically no hope of steering it in the right direction through regulation, (which they would try to since it would serve their best interests as individuals). The common person would still have a say in these regulations through the vote.

−1

GinoAcknowledges t1_j5kb95p wrote

A vast amount of technological knowledge (e.g. how to create poisons, manufacture bombs) has mass destructive potential if it can be scaled. The difficulty, just like with AI, is scaling, and this mostly self-regulates (with help from the government).

For example, you can build dangerous explosive devices in your garage. That knowledge is widely available (google "Anarchists Handbook"). If you try and build thousands of them (enough to cause mass destruction) the government will notice, and most likely, you aren't going to have enough money and time to do it.

The exact same thing will happen for "dangerous uses of AI". The only actors which have the hardware and capital to cause mass destruction with AI are the big tech firms developing AI. Try running inference at scale on even a 30B parameter model right now. It's extremely difficult unless you have access to multiple server-grade GPUs which are very expensive and hard to get ahold of even if you had the money.

3

BitterAd9531 t1_j5idapl wrote

>trivially obvious that AI should never be open-source

Wow. Trivially obvious? I'd very much like to know how that statement is trivially obvious, because it goes against what pretty much every single expert in this field advocates.

Obviously open-source AI brings problems, but what is the alternative? A single entity controlling one of the most disrupting technologies ever? And ignoring for a second the obvious problems with that, how would you enforce it? Criminalize open-sourcing of software? Can't say I'm a fan of this line of thinking.

5

Historical-Coat5318 t1_j5juhb7 wrote

AI in my view should be controlled by very few institutions, and these institutions should be carefully managed by experts and very intelligent people, which is the case for companies like Google or OpenAI. If AI must exist, and it must, I would much rather it were in the hands of people like Sam Altman and Scott Aaronson than literally everyone with an internet connection.

Obviously terms like "open-source" and "democratised" sound good, but if you think about the repercussions of this you will surely realise that it would be totally disastrous for society. Looking back in history we can see that nuclear weapons were actually quite judiciously managed when you consider all of the economic and political tensions of the time, now imagine if anyone could have bought a nuke at Walmart, human extinction would have been assured. Open-source AI is basically democratized mass-destruction, and if weapons of mass-destruction must exist (including AI), then it should be in as few hands as possible.

Even ignoring existential risk, which is obviously still very speculative, even LLMs should never be open-source because that makes any regulation impossible. In that world evidence (video, images and text), not to mention human creativity, would cease to exist and the internet would basically be unnavigable as the chasm between people's political conception of the world and the world itself only widens. Only a few companies should be allowed to have this technology, and they should be heavily regulated. I admit I don't know how this could be implemented, I just know that it should be.

This is basically Nick Bostrom's Vulnerable World Hypothesis. Bostrom should be read as a prerequisite for everyone involved in AI, in my opinion.

−2

Throwaway00000000028 t1_j5k1aqv wrote

Just curious, why do you think it's "ethically imperative to be able to discern human text from AI text"? Would it really be so bad if you were talking to a computer?

6

Historical-Coat5318 t1_j5k3k1o wrote

I think so, yes. In that world the dead internet theory would become true and people will become only more dissociated from reality and society, especially so when AI can generate video and audio. The political repercussions are disastrous.

Also, I really love literature (and art in general) and a future where one cannot differentiate a human writer from AI is, frankly, suicidally bleak to me. I can see a future where publishers use AI to read the market and write the right books for maximum profit completely cutting out human authors from the process. I am an aspiring novelist myself and, while the act of writing is intrinsically motivating there is also a massive social component in terms of having a career and having others read your work that would be completely excised from creativity, so there is also a personal component I suppose. Sharing in the creativity of other humans is the main thing that gives life meaning to me personally and to many others, and to have that stripped from life is extremely depressing.

While this is all very speculative I just can't see the rapid advances in AI leading anywhere expect a lonelier, more isolated and chaotic world if it isn't seriously regulated. But all of this can be fixed if we could just identify AI text. Then nothing would change in terms of the place of human creativity in the world, it would be basically like chess, people still devote their lives to it and the community thrives but only because we can discern AI chess playing from human chess playing. Imagine if there were no anti-cheating policies in chess tournaments, no one would ever play chess seriously ever again.

If we could just identify AI output we would get all of the benefits of LLMs without any of the disastrous drawbacks. To me it is the most important issue right now, but people don't even consider it and are outright hostile to the idea, just see the downvotes to my original reply.

−1

EmmyNoetherRing t1_j5er3xp wrote

I’d heard they had added one, actually. Or were planning to— the concern they listed was they didn’t want the model accidentally training on its own output, as more of its output shows up online.

I have to imagine this is a situation where security by obscurity is unavoidable though, so if they do have a watermark we might not hear much about it. Otherwise malicious users would just clean it back out again.

We may end up with a situation where only a few people internal to OpenAI know how the watermark works, and they occasionally answer questions for law enforcement with the proper paperwork.

51

artsybashev t1_j5fgnjm wrote

So they are effectively currently poluting the public space with their AI which only they have the tool to detect. Smells like anti-competition to me. This potentially makes the competing teams models worse since they will be eating the shit that GPT3 pushes out.

0

dineNshine t1_j5esx4f wrote

Why would you want to do this? We can fake text without GPT, and we also have the means to prove authenticity by digital signatures. By limiting the technology artificially, you will end up limiting the end user, while organizations with more resources will still be able to circumvent these limitations by training their own models.

To avoid limiting usability, desired limitations should be applied on top of the base model by the end user, not to the base model itself.

The sole consequence of attempts like which OP suggests is further centralization of the technology, which is the worst imaginable result.

25

Historical-Coat5318 t1_j5f8ruz wrote

> prove authenticity by digital signatures

Could you expand on this?

4

twiztidsoulz t1_j5fa57q wrote

Ever used DocuSign or ssl?

4

Historical-Coat5318 t1_j5fh33h wrote

Say a novelist in the future wants to prove to the world that they wrote their book themselves and not an AI, how could that be done with DocuSign? Or SSL? That's the kind of use case I'm thinking of.

1

mje-nz t1_j5gor00 wrote

Could you expand on how DocuSign and SSL relate to a chatbot?

1

twiztidsoulz t1_j5gqj6q wrote

I think you're taking my response a bit too literally. My response was regarding digital signature as a watermark. Where I was thinking it could possibly go, is every model is digitally signed (in the same way SSL certificates are signed by a trusted CA).

1

mje-nz t1_j5gw3n5 wrote

Are you talking about the model? We’re talking about the output. If you’re talking about signing the model, what does that achieve? If you’re talking about signing the output, how do you sign a chat transcript?

2

mirrorcoloured t1_j5gol1a wrote

How would extra embedded information limit the end user, specifically?

1

hiptobecubic t1_j5jbgh1 wrote

One thing i can imagine is that the AI refuses to output text that doesn't trigger the watermark detector.

1

dineNshine t1_j5vm4r5 wrote

By definition. If you force the model to embed a watermark, you can only generate watermarked content. Since OP proposed to embed it into model parameters, it would also likely degrade performance.

Limiting the end user this way is bad, for reasons I have stated above. The right approach is to train a model that fits the data well, and then condition it using the input prompt. Putting arbitrary limits on the model itself to prevent misuse is misguided at best, making sure that only people in power will be able to utilize the technology to its fullest. This would also give people a false sense of security, since they might think that content generated with a model lacking a watermark is "genuine".

If AI advances to a point where the content it generates is indistinguishable from human-generated content and fake recordings become a problem, the only sensible thing we can really do is using signatures. This is a simple method that works perfectly well. For any piece of virtual content, you can quickly check if it came from an entity known to you by checking against their public key.

1

mirrorcoloured t1_j5wwhn5 wrote

While I agree with your concluding sentiments against centralization and recommending use of signatures, I don't believe your initial premise holds.

Consider stenography in digital images, where extra information can be included without any noticeable loss in signal quality.

One could argue that any bits not used for the primary signal are 'limiting usability', but this seems pedantic to me. It seems perfectly reasonable that watermarking could be implemented with no noticable impacts, given the already massive amount of computing power required and dense information output.

1

dineNshine t1_j5xeqvi wrote

Embedding watermarks into images directly is one thing. OP suggested changing model parameters such that the model produces watermarked images, which is different. Editing model parameters in a functionally meaningful way would be hard without affecting performance. It seems like you are referring to a postprocessing approach, which is along the lines of what I recommended in general for curating model outputs. In this instance, this kind of solution wouldn't perform the function OP intended, which is preventing users from generating images without the watermark (since postprocessing is not an integral part of the model and is easy to remove from the generation process).

It is conceivable that the parameters could be edited in an otherwise non-disruptive way, although unlikely imo. I don't like this kind of approach in general though. The community seems to channel a lot of energy into making these models worse to "protect people from themselves". I despise this kind of intellectual condescension.

1

mirrorcoloured t1_j6e1ckl wrote

Yes I wasn't clear on the comparison, but I meant more by analogy that it's possible to hide information in images without noticeable impact to humans. In this space, I just have my anecdotal experience that I can use textual inversion embeddings that use up 10-20 tokens with no reduction in quality that I can notice. I'm not sure how much a quality 'watermark' would require, but based on this experience and the fact that models are getting more capable over time, it seems reasonable to me that we could spare some 'ability' and not notice.

I also agree with the philosophy of 'do one thing and do it well' where limitations are avoided and modularity is embraced. Protecting people from themselves is unfortunately necessary, as our flaws are well understood and fairly reliable at scale, even though we can all be rational at times. As a society I think we're better off if our pill bottles have child safe caps, our guns have safeties, and our products have warning labels. Even if these things marginally reduce my ability to use them (or increase their cost), it feels selfish for me to argue against them when I understand the benefits they bring to others (and myself when I'm less hubristic). To say that, for example, 'child-safe caps should be optionally bought separately only by those with children and pets' ignores the reality that not everyone would do that, friends and family can visit, people forget things in places they don't belong, etc. The magnitude of the negative impacts would be far larger than the positive, and often experienced by different people.

1

dineNshine t1_j6gikpr wrote

Children and pets are not the same as adults. Guns are also different from language models and image generators. A gun is a weapon, but a language model isn't.

Adding certain protections might be necessary for objects that can otherwise cause bodily harm to the user (e.g. gun safeties), but if you think that people must be prevented from accessing information because they are too stupid to properly evaluate it, then you might as well abolish democracy.

I am not doubting that people can evaluate information incorrectly. The issue is that nobody can do it in an unbiased way. The people doing the censorship don't know all that much better and often don't have the right intentions, as is often demonstrated.

It has been shown that ChatGPT has strong political biases as a result of the tampering applied to make it "safe". I find this concerning.

1

e-rexter t1_j5i4jfg wrote

Signed authenticity for news and other high quality human content needs to scale. I know some news ors have been working on this for years. It is time to roll it out at scale.

1

Advanced-Hedgehog-95 t1_j5es27r wrote

Watermark may be useful in academic context but why should a company care?

I also wonder what happens to hundreds of writing support apps that are built upon these GPT models.

18

EmmyNoetherRing t1_j5ffiqv wrote

The company wants to be able to identify their own output when they see it in the wild, so they can filter it out when they’re grabbing training data. You don’t want the thing talking to itself.

22

artsybashev t1_j5fh1m8 wrote

I wonder if in 50 years, the LLM models are able to produce "viruses" that cause problems in competing models. Like AI hacking the other AI through injecting disruptive training data to the enemy training procedure.

18

EmmyNoetherRing t1_j5fhpjq wrote

There’s almost an analogy here to malicious influence attacks aimed at radicalizing people. You have to inundate them with a web of targeted information/logic to gradually change their worldview.

13

ardula99 t1_j5fsdsw wrote

That is what adversarial data points are -- people have discovered that it is possible to "confuse" image models using attacks such as these. Take a normal picture of cat, feed it into an image model and it'll label it correctly and say - hey, I'm 97% sure there's a cat in this. Change a small number of pixels using some algorithm (say <1% of the entire image) - to a human, it will still look like a cat, but an image model now thinks it's a stop sign (or something equally unlikely) with 90%+ probability.

2

EmmyNoetherRing t1_j5g8ogy wrote

So, not quite. You’re describing funny cases that a trained classifier will misclassify.

We’re talking about what happens if you can intentionally inject bias into an AI’s training data (since it’s pulling that data from the web, if you know where it’s pulling from you can theoretically influence how it’s trained). That would potentially cause it to misclassify many cases (or have other more complex issues). It starts to be weirdly slightly feasible if you think about a future where a lot of online content is generated by AI— but we have at least two competing companies/governments supplying those AI.

Say we’ve got two AI’s, A & B. A can use secret proprietary watermarks to recognize its own text online and avoid using that text in its training data (it wants to train on human data). And of course AI B can do the same thing, to recognize its own text. But since each AI is using its own secret watermarks, there’s no good way to prevent A from accidentally training on B’s output. And vice versa.

The AI’s are supposed to only train on human data, to be more like humans. But maybe there will be a point where they unavoidably start training on each other. And then if there’s a malicious actor, they might intentionally use their AI to flood a popular public text data source with content that, if the other AI ingest it, will cause them to behave in a way that the actor wants (biased against their targets, or biased positively for the actor).

Effectively, at some point we may have to deal with people secretly using AI to advertise to, radicalize, or scam other AI. Unless we get some fairly global regulations up in time. Should be interesting.

I wonder to what extent we’ll manage to get science fiction out about these things before we start seeing them in practice.

7

ISvengali t1_j5hlozp wrote

> I wonder to what extent we’ll manage to get science fiction out about these things before we start seeing them in practice.

Its not an exact match, but reminds me quite a lot of Snow Crash

3

e-rexter t1_j5i49g4 wrote

Great book. Required reading back in the mid 90s when I worked at WIRED.

2

e-rexter t1_j5i42p1 wrote

Reminds me of the movie multiplicity, in which each copy gets dumber.

2

watchsnob t1_j5fl99s wrote

couldn't they just log all of their own outputs and check against it?

3

ardula99 t1_j5fsj2m wrote

Not scalabale long term, especially once they start selling to clients. Clients will have privacy and security issues with OpenAI having full access (and logging history) of all their previous queries.

1

Acceptable-Cress-374 t1_j5ivlhe wrote

> You don’t want the thing talking to itself.

Heh, I was thinking about this the other day. Do you think there's a world where LLMs can become better by "self-play" a la AlphaZero? Would it converge to understandable language or would it diverge into babllbe-speak?

1

conchoso t1_j5ez9w0 wrote

For comparison, the Stable Diffusion-based AI models that generate prompted images DO have an invisible but detectable watermark embedded by default in those hundreds of dreamed up images that get posted to reddit now everyday ... but they included an option to turn it off. Steganography is far further along in digital images than plain text though...

17

VelveteenAmbush t1_j5fp3jp wrote

> but they included an option to turn it off

They did not include an option to turn it off. They released it as open-source software, which lets people modify it themselves, including by turning it off.

11

artsybashev t1_j5fhioy wrote

Yeah it is easier to modify the color of a pixel that characters in a text in a way than humans do not detect. Something can be done through typos, weird choise of word or calculating the checksum of word choises, but those methods can easily sound unnatural to human readers.

9

Username912773 t1_j5j7vci wrote

How does that watermark work without altering image quality significantly across many prompts and color schemes?

1

MysteryInc152 t1_j5eyfnm wrote

Any watermark that couldn't easily be bypassed (paraphrase, switching out words every nth word etc) would cripple the output of the model. In fact even the simple watermarks could have weird effects on output.

5

andreichiffa t1_j5fd581 wrote

They kinda exist (eg GPT2 detector from Hugging Face), based off the data that they trained on (which is the limiting factor). However, ultimately every model can be modified (fine-tuned) to evade them. Even for large models (>7B parameters), it can be done reasonably fast on commodity hardware these days.

3

e-rexter t1_j5i4rs8 wrote

I used a detector called GPTZero and it did pretty good, but completely missed something written as a tweet or in the style of…

2

andreichiffa t1_j5il76n wrote

Yup. And then they also will detect human texts that start in the same way as MS COCO dataset as GTP-generated.

1

TheTerrasque t1_j5irq12 wrote

On a side note, 7b isn't large these days.

GPT3 and BLOOMZ are around 175b parameters.

1

andreichiffa t1_j5ivwgc wrote

or OPT175.

However 7B is more than large enough to do a lot of shady stuff that 175B models can do. Even 1.5B ones are already starting to do a good job with a minimally competent user.

1

HatsusenoRin t1_j5gnb27 wrote

An interesting SF plot would be that researchers start discovering watermarks in ancient text and geoglyphs while sanitizing training data...

3

perspectiveiskey t1_j5hxld7 wrote

Or in human speech. Very faint but there, so that no matter how anonymously you post, you write a sentence of enough words and your identity leaks.

Kinda one of the premises of the book Dodge from Neil Stephenson.

1

hiptobecubic t1_j5jb8db wrote

This is the current state of things. With sufficient training data we can identify a lot about who wrote some text from the statistical properties of the text itself and voice has a long history of being used for identification at this point.

1

abriec t1_j5eqmbt wrote

I believe there’s ongoing work related to this at OpenAI and similar research in generative models in general, like this submission currently under review.

2

link0007 t1_j5fdzd6 wrote

I wonder if advancements in text watermarking will actually help create watermarks for e.g. sensitive or classified governmental/corporate documents. Would allow for instant identification who leaked a certain document if you could trace the watermark back to user accounts.

2

JackandFred t1_j5epyi0 wrote

It wouldn’t necessarily be easy. But you say you want one detectable by some “key or other model” you can already design or use a model to detect if it was generated by Gpt, so it wouldn’t really need to use a watermark if you’re using a model. And if you’re using a more traditional watermark for digital pictures it could be very easily removed.

1

hiptobecubic t1_j5jcexd wrote

This will be a permanent arms race of training the generative model and the detector to defeat one another, with each iteration making it harder and harder for humans to do so unaided. Training these models is expensive and only the big corporate players are currently able to do so.

1

new_name_who_dis_ t1_j5ern6t wrote

How do you detect text produced by GPT? Is there like open source code?

0

e-rexter t1_j5i55p1 wrote

Check out gptZero as one example. It uses perplexity and other characteristic differences between human and AI generated text. Not perfect, but works on longer text passages. Unfortunately, on can train AI to have more variation, this defeating the detector.

1

morebikesthanbrains t1_j5fa9uq wrote

This is like adding a watermark to early calculators. You can't stop evolution; the cat's out the bag. People are going to move on from critical thinking for better or worse, and humans will evolve to a higher level of masturbation.

Don't check this post for a watermark.

1

GalaxyGoldMiner t1_j5flc0v wrote

Thinking out loud, instead of watermarking you could just look at each tokens conditional probability of being sampled based on the prior tokens; if the probabilities are high in aggregate it is likely to hang come from low temperature GPT. This assumes that transformer models trained by different companies (on presumably overlapping data) will have different enough predictions in long sequences.

1

Ok-Debt7712 t1_j5gr3mq wrote

Just use a good text spinner and don't worry about it.

1

fraktall t1_j5gtq5j wrote

It will be reverse engineered in now time. The output of the model is just text.

1

KBM_KBM t1_j5hnoke wrote

Maybe say if we can download our answer then in the file we get the answer some watermark is encoded.

1

Kbig22 t1_j5if78n wrote

Just create an algorithm that adds some unusual space characters which acts like a public key.

1

Cherubin0 t1_j5iwn3e wrote

I don't see a value in that. Humans can just lie as much and write things that are not true or pretend to be someone else. The only difference is that you can write much more with such tools.

1

aero_oliver t1_j5iz5d7 wrote

I curious, the absolutely no way a watermark can’t be worked around. I think this is gonna turn out to be a Caine of cat and mouse which each side having to constantly develop.

1

shadeofmyheart t1_j5mhq5a wrote

You mean like ChatGPT being able to recognize the tokens in its own output?

1

john_the_jedi t1_j646qh1 wrote

Hey everyone, I'm the first author of this preprint paper
"A Watermark For Large Language Models": https://arxiv.org/abs/2301.10226
I thought I'd jump in with a few relevant comments about some questions in this thread, especially relating to our approach.

  1. Our watermark is mathematically constructed to minimize false positives (accusing human text of being machine generated), even if it costs us a few detections of actual machine generated text. At any sufficient length of text, say 100-200 words, there is near 0.0 chance of a false positive. This is obviously the type of error we'd all like to avoid as much as possible.
  2. We are not anti-LLMs in any general way, these are amazing tools for everyone to use! Rather, we think that it's much better to have a new tool, watermarks, embedded in these models sooner rather than later. A world in which we have limited (currently zero really) ways of distinguishing AI and human generated content is likely to have some difficult to wrestle with consequences. We're concerned with bot farms and accidentally retraining "GPT-10" on tons of old GPT-3 outputs by accident.
  3. On removing the watermark, we don't claim it is not removable, we just have constructed the watermark procedure so that it is difficult, and comes with a cost to the quality of the output. The fact that many people suggest that they'll just use another LM to paraphrase the output, or that they'll just paraphrase it themselves, gets at a philosophical point we couldn't spend too much time talking about in the paper (though we run some attack experiments trying to remove the watermark). A la the, ship of theseus, if you sufficiently re-write the watermark out of the text, well, it's no longer the original text anyway even though it feels conceptually similar. Rewriting and rephrasing a paragraph from a textbook, but in your own words, and then putting it in a term paper, has always been a way to try and pass off the thoughts and ideas of others as your own. This fact of the world is unchanged.
1

WikiSummarizerBot t1_j646sbr wrote

Ship of Theseus

>The Ship of Theseus is a thought experiment about whether an object that has had all of its original components replaced remains the same object. According to legend, Theseus, the mythical Greek founder-king of Athens, had rescued the children of Athens from King Minos after slaying the minotaur and then escaped on a ship to Delos. Every year, the Athenians commemorated this legend by taking the ship on a pilgrimage to Delos to honor Apollo. The question was raised by ancient philosophers: After several centuries of maintenance, if every part of the Ship of Theseus had been replaced, one at a time, was it still the same ship?

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1

lucellent t1_j5feo8g wrote

I think you're a bit late on the news. OpenAI have already said they will add a watermark to their ChatGPT responses.

0

londons_explorer t1_j5fa3eq wrote

OpenAI keep a big database of all the output.

That in itself serves the same purpose of a watermark.

OpenAI can take any bit of text, and search their database and see if it came from their service.

−1

hiptobecubic t1_j5jchks wrote

This would be trivially defeated by making tiny changes to the text. Also it is wildly impractical and won't scale up to wide spread usage.

1