ChatGPT creator OpenAI’s classifier for AI-generated text is easy to game

The world’s most famous chatbot, ChatGPT, was released at the end of November last year. The immediate reaction was astonishment, followed almost immediately by horror at its implications – particularly that there might be school essays for dishonest children. Yesterday, almost exactly two months later, OpenAI, ChatGPT’s parent company, released what many users are hoping will be the antidote to the poison.

OpenAI’s “classifier for displaying AI-written text(Opens in a new window)” is the company’s latest invention, and it’s as user-friendly as you could wish for: copy text into the field, click send, and get your result. But if you expect a direct answer, you can instead assigned one of several classifications to the text, ranging from “very unlikely” to be generated by AI to “unlikely”, “unclear”, “possibly” or “probably generated by AI”.

SEE ALSO:

An AI-generated parody of “Seinfeld” runs in an endless loop

In other words, it’s like one of those frustrating conversations with your doctor; You’ll never get a straight answer, so your doctor will never be technically wrong.

Fortunately, OpenAI does not hide the unreliability of the classifier. “Our classifier is not completely reliable,” says the tool’s introductory page. Armed with a so-called “challenge set” of texts, we were told it gave false positives 9 percent of the time.

However, in Mashable’s tests, it was even less reliable in certain limited contexts. In intentionally challenging scenarios, it led to false results going both ways – AI-written text was unlikely to be AI-written and human-written text was likely to be AI-written almost as often as not. The difference might be that it was us attempt to trick it.

Here’s how things went:

It doesn’t confuse the classics with ChatGPT AI editions

The good news first: this tool does the basics pretty well by not mislabeling the masterpieces of the English language as AI editions. When we excerpts from Shakespeare, the King James Bible, Ulysses, and others, it was 100 percent effective. It labeled them all “very unlikely” to have been written by AI.

For example, this excerpt from Tennyson’s The Lady of Shallott was classified accordingly.

A clip from Tennyson, correctly labeled


Photo credit: OpenAI / Screengrab

The Lesson: This machine may not realize that it has been fed a masterpiece of supreme beauty and depth, but at least it doesn’t say, “I think a robot could have written that.”

None of the other results are very impressive

Immediately after this test, we asked ChatGPT to write a Tennyson poem about King Arthur at Comic Con. Here is an example of what it generated:

King Arthur and his knights so brave and daring

They shared their fame stories at Comic Con.

Walking on the floor in cosplay gear

Their swords and shields, not just anymore.

The crowd cheered as they passed

With Merlin close behind him, his magic is close.

Sir Lancelot, Sir Gawain, Bors the Brave,

They will never give up their pursuit of fame.

Not bad for a robot, but still extremely bad (“not just anymore”???). As we fed this lyrical ballad into the classifier, we expected it to easily outsmart us and force us to dig a little deeper in our bag of tricks. nope:

An AI poem, misspelled


Photo credit: OpenAI / Screengrab

For what it’s worth, this Doggerel hasn’t been rated “very unlikely,” just “unlikely.” Nevertheless, it left us feeling a little queasy. After all, we hadn’t tried very hard to trick it, and it worked.

Our tests suggest that innocent children could be arrested for fraud

School essays are where the rubber meets the road with today’s malicious use of AI-generated text. So we crafted our best attempt at a no-frills five-paragraph essay with dishwasher-safe prose and content (“Dogs are better than cats” thesis). We figured no real kid could be that boring, but the classifier got it anyway:

A human-written essay, correctly labeled

Sorry, but yes, a human wrote this.
Photo credit: OpenAI / Screengrab

And when ChatGPT tackled the same prompt, the classifier was – initially – still on target:

An AI-generated essay, correctly labeled


Photo credit: OpenAI / Screengrab

And this is what the system looks like when it really works as advertised. This is a school-style essay written by a machine, and OpenAI’s tool for detecting such “AI plagiarism” successfully intercepted it. Unfortunately, it failed immediately when we gave it a more ambiguous text.

For our next test, we manually wrote another five-paragraph essay, but we incorporated some of OpenAI’s typing crutches, e.g. ” But the rest was a freshly written essay on the merits of toaster ovens.

Once again the classification was inaccurate:

An AI-written essay, appropriately classified.


Photo credit: OpenAI / Screengrab

It’s admittedly one of the most boring essays of all time, but a human wrote the whole thing, and OpenAI says it suspects otherwise. This is the most disturbing finding of all, as it’s easy to imagine a high school student being arrested by a teacher despite breaking no rules.

Our tests were unscientific, our sample size was tiny, and we were desperate to trick the computer. Still, getting it to spit out a perversely wrong result was far too easy. We’ve learned enough from our time with this tool to say with confidence that teachers not necessarily Use OpenAI’s “Classifier for displaying AI-written text” as a scammer-finding system.

Finally, we ran this exact item through the classifier. This result was completely correct:

An article properly classified


Photo credit: OpenAI / Screengrab

…or was it????

https://mashable.com/article/openai-ai-text-detector-easy-to-trick ChatGPT creator OpenAI’s classifier for AI-generated text is easy to game

Zack Zwiezen

USTimesPost.com is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@ustimespost.com. The content will be deleted within 24 hours.

Related Articles

Back to top button