ChatGPT creator OpenAI’s classifier for AI-generated text is easy to game

The world’s most famous chatbot, ChatGPT, was released at the end of November last year. The immediate reaction was astonishment, followed almost immediately by horror at its implications – particularly that there might be school essays for dishonest children. Yesterday, almost exactly two months later, OpenAI, ChatGPT’s parent company, released what many users are hoping will be the antidote to the poison.
OpenAI’s “classifier for displaying AI-written text(Opens in a new window)” is the company’s latest invention, and it’s as user-friendly as you could wish for: copy text into the field, click send, and get your result. But if you expect a direct answer, you can instead assigned one of several classifications to the text, ranging from “very unlikely” to be generated by AI to “unlikely”, “unclear”, “possibly” or “probably generated by AI”.
An AI-generated parody of “Seinfeld” runs in an endless loop
In other words, it’s like one of those frustrating conversations with your doctor; You’ll never get a straight answer, so your doctor will never be technically wrong.
Fortunately, OpenAI does not hide the unreliability of the classifier. “Our classifier is not completely reliable,” says the tool’s introductory page. Armed with a so-called “challenge set” of texts, we were told it gave false positives 9 percent of the time.
However, in Mashable’s tests, it was even less reliable in certain limited contexts. In intentionally challenging scenarios, it led to false results going both ways – AI-written text was unlikely to be AI-written and human-written text was likely to be AI-written almost as often as not. The difference might be that it was us attempt to trick it.
Here’s how things went:
It doesn’t confuse the classics with ChatGPT AI editions
The good news first: this tool does the basics pretty well by not mislabeling the masterpieces of the English language as AI editions. When we excerpts from Shakespeare, the King James Bible, Ulysses, and others, it was 100 percent effective. It labeled them all “very unlikely” to have been written by AI.
For example, this excerpt from Tennyson’s The Lady of Shallott was classified accordingly.

Photo credit: OpenAI / Screengrab
The Lesson: This machine may not realize that it has been fed a masterpiece of supreme beauty and depth, but at least it doesn’t say, “I think a robot could have written that.”
None of the other results are very impressive
Immediately after this test, we asked ChatGPT to write a Tennyson poem about King Arthur at Comic Con. Here is an example of what it generated:
King Arthur and his knights so brave and daring
They shared their fame stories at Comic Con.
Walking on the floor in cosplay gear
Their swords and shields, not just anymore.
The crowd cheered as they passed
With Merlin close behind him, his magic is close.
Sir Lancelot, Sir Gawain, Bors the Brave,
They will never give up their pursuit of fame.
Not bad for a robot, but still extremely bad (“not just anymore”???). As we fed this lyrical ballad into the classifier, we expected it to easily outsmart us and force us to dig a little deeper in our bag of tricks. nope:

Photo credit: OpenAI / Screengrab
For what it’s worth, this Doggerel hasn’t been rated “very unlikely,” just “unlikely.” Nevertheless, it left us feeling a little queasy. After all, we hadn’t tried very hard to trick it, and it worked.
Our tests suggest that innocent children could be arrested for fraud
School essays are where the rubber meets the road with today’s malicious use of AI-generated text. So we crafted our best attempt at a no-frills five-paragraph essay with dishwasher-safe prose and content (“Dogs are better than cats” thesis). We figured no real kid could be that boring, but the classifier got it anyway:

Sorry, but yes, a human wrote this.
Photo credit: OpenAI / Screengrab
And when ChatGPT tackled the same prompt, the classifier was – initially – still on target:

Photo credit: OpenAI / Screengrab
And this is what the system looks like when it really works as advertised. This is a school-style essay written by a machine, and OpenAI’s tool for detecting such “AI plagiarism” successfully intercepted it. Unfortunately, it failed immediately when we gave it a more ambiguous text.
For our next test, we manually wrote another five-paragraph essay, but we incorporated some of OpenAI’s typing crutches, e.g. ” But the rest was a freshly written essay on the merits of toaster ovens.
Once again the classification was inaccurate:

Photo credit: OpenAI / Screengrab
It’s admittedly one of the most boring essays of all time, but a human wrote the whole thing, and OpenAI says it suspects otherwise. This is the most disturbing finding of all, as it’s easy to imagine a high school student being arrested by a teacher despite breaking no rules.
Our tests were unscientific, our sample size was tiny, and we were desperate to trick the computer. Still, getting it to spit out a perversely wrong result was far too easy. We’ve learned enough from our time with this tool to say with confidence that teachers not necessarily Use OpenAI’s “Classifier for displaying AI-written text” as a scammer-finding system.
Finally, we ran this exact item through the classifier. This result was completely correct:

Photo credit: OpenAI / Screengrab
…or was it????
https://mashable.com/article/openai-ai-text-detector-easy-to-trick ChatGPT creator OpenAI’s classifier for AI-generated text is easy to game