9-11GaveMe5G t1_j6qk8ne wrote on February 1, 2023 at 4:31 AM

> 26% success rate

If all it's doing is saying yes this is AI generated or no it's not, this is considerably worse than chance.

RoboNyaa t1_j6qoeym wrote on February 1, 2023 at 5:09 AM

It'll never be 100%, because the whole point of ChatGPT is to write like a human.

Likewise, anything less than 100% means it cannot be used to validate academic integrity.

manowtf t1_j6txasq wrote on February 1, 2023 at 9:26 PM

They could just store the answers they generate and then allow a comparison if you want to check, how close what you submit is to what they generated

kevindamm t1_j6qmixr wrote on February 1, 2023 at 4:51 AM

There are four buckets (of unequal size) but I don't know if success was measured by landing within the "correct" bucket or being within the highest p(AI-gen) bucket as TP, or both extreme top and bottom buckets. I only read the journalistic article and not the original research, so idk. 1000 character minimum worries me more, there's quite a lot of text smaller than that (like this comment).

9-11GaveMe5G t1_j6qy4i3 wrote on February 1, 2023 at 6:55 AM

I understand the bigger sample the better, but is it possibly limited to longer text so it's usefulness is kept academic? You mentioned your comment, and I think going to a platform like reddit and being able to offer a report on comment genuineness would be valuable

PhoneAcc2 t1_j6qvw8j wrote on February 1, 2023 at 6:29 AM

The article misrepresents this.

They use 5 Categories from very unlikely AI generated to very likely.

26% refers to the AI generated part of their validation set being labeled as very likely AI generated.

IKetoth t1_j6rc6i2 wrote on February 1, 2023 at 10:06 AM

Which is a 26% success rate, how is that being misrepresented? The fact its 'somewhat confident' on other samples means nothing, if this was to be used for validating articles anywhere like writing competitions or academia you'd want that "very confident" number to be in the high 90s or at the very least the false positive amount incredibly low.

PhoneAcc2 t1_j6recky wrote on February 1, 2023 at 10:36 AM

The article suggests there is a single "success" metric in OpenAIs publication, which there is not and deliberately so.

Labeling text as AI generated will always be fuzzy (active watermarking aside) and become even harder as models improve and get bigger. There is simply an area where human and AI written texts overlap.

Have a look at the FAQ on their page if you're interested in the details: https://platform.openai.com/ai-text-classifier

IKetoth t1_j6rfyq3 wrote on February 1, 2023 at 10:58 AM

No need, I don't see a point to this, I expect given 3-5 years of adversarial training if left unregulated they'll be completely impossible to tell apart to a level where there'd be any point to it, we need to learn to adapt to the fact AI writing is poised to replace human writing in anything not requiring logical reasoning

Edit: I'd add that we need to start thinking as a species about the fact we've reached the point where human labour need not apply, there are now automatic ways to do nearly everything, the only thing stopping us is the will to use them and resources being concentrated rather than distributed, assuming plentiful resources nearly everything CAN be done without human intervention.

[deleted] t1_j6qx4ti wrote on February 1, 2023 at 6:43 AM

[deleted]

[deleted] t1_j6rtbsi wrote on February 1, 2023 at 1:20 PM

[deleted]

Gathorall t1_j6w9xpb wrote on February 2, 2023 at 9:43 AM

Coming to the concepts of sensitivity and specifity.

The sensitivity of this test would have to be nery near to 100% especially when as you said, most people aren’t cheats; just a few percentage points off sensitivity can mean that a large absolute part of the people "caught" are actually innocent.

[deleted] t1_j6wd6i0 wrote on February 2, 2023 at 10:30 AM

[deleted]

hawkeye224 t1_j6rjhzz wrote on February 1, 2023 at 11:41 AM

Anyone knows what's the success rate of GPTZero, the detector that a student wrote?

I found this article (https://futurism.com/gptzero-accuracy) and it seems to indicate around 80%, but on a very small sample.

OpenAI releases tool to detect AI-generated text, including from ChatGPT

kevindamm t1_j6q2lwy wrote on February 1, 2023 at 2:12 AM

9-11GaveMe5G t1_j6qk8ne wrote on February 1, 2023 at 4:31 AM

RoboNyaa t1_j6qoeym wrote on February 1, 2023 at 5:09 AM

manowtf t1_j6txasq wrote on February 1, 2023 at 9:26 PM

kevindamm t1_j6qmixr wrote on February 1, 2023 at 4:51 AM

9-11GaveMe5G t1_j6qy4i3 wrote on February 1, 2023 at 6:55 AM

PhoneAcc2 t1_j6qvw8j wrote on February 1, 2023 at 6:29 AM

IKetoth t1_j6rc6i2 wrote on February 1, 2023 at 10:06 AM

PhoneAcc2 t1_j6recky wrote on February 1, 2023 at 10:36 AM

IKetoth t1_j6rfyq3 wrote on February 1, 2023 at 10:58 AM

[deleted] t1_j6qx4ti wrote on February 1, 2023 at 6:43 AM

[deleted] t1_j6rtbsi wrote on February 1, 2023 at 1:20 PM

Gathorall t1_j6w9xpb wrote on February 2, 2023 at 9:43 AM

[deleted] t1_j6wd6i0 wrote on February 2, 2023 at 10:30 AM

hawkeye224 t1_j6rjhzz wrote on February 1, 2023 at 11:41 AM