AI chatbot ChatGPT can't create convincing scientific papers… yet

A man wearing glasses with computer code reflected in the glass — Researchers have developed a computer learning program that can spot fake scientific papers generated by AI. (Image credit: Shutterstock)

The artificial intelligence (AI) chatbot ChatGPT may be a decent mimic of human workers in several fields, but scientific research is not one of them, according to a new study that used a computer program to spot fake studies generated by the chatbot. But the AI is still capable of fooling some humans with its science writing, previous research shows.

Since bursting onto the scene in November 2022, ChatGPT has become a hugely popular tool for writing reports, sending emails, filling in documents, translating languages and writing computer code. But the chatbot has also been criticized for plagiarism and its lack of accuracy, while also sparking fears that it could help spread "fake news" and replace some human workers.

In the new study, published June 7 in the journal Cell Reports Physical Science, researchers created a new computer learning program to tell the difference between real scientific papers and fake examples written by ChatGPT. The scientists trained the program to identify key differences between 64 real studies published in the journal Science and 128 papers created by ChatGPT using the same 64 papers as a prompt.

The team then tested how well their model could differentiate between a different subset of real and ChatGPT-generated papers, which included 60 real papers from the journal Science and 120 AI-generated counterfeits. The program flagged the AI-written papers more than 99% of the time and could correctly tell the difference between human-written and chatbot-written paragraphs 92% of the time.

A phone screen with the Science journal website displayed — Researchers used scientific papers from the journal Science to create fake ones with ChatGPT. (Image credit: Shutterstock)

ChatGPT-generated papers differed from human text in four key ways: paragraph complexity, sentence-level diversity in length, punctuation marks and "popular words." For example, human authors write longer and more complex paragraphs, while the AI papers used punctuation that is not found in real papers, such as exclamation marks.

The researchers' program also spotted lots of glaring factual errors in the AI papers.

"One of the biggest problems is that it [ChatGPT] assembles text from many sources and there isn't any kind of accuracy check," study lead author Heather Desaire, an analytical chemist at the University of Kansas, said in the statement. As a result, reading through ChatGPT-generated writing can be like "playing a game of two truths and a lie," she added.

Creating computer programs to differentiate between real and AI-generated papers is important because previous studies have hinted that humans may not be as good at spotting the differences.

—Expect an Orwellian future if AI isn't kept in check, Microsoft exec says

—AI drone may have 'hunted down' and killed soldiers in Libya with no human input

In December 2022, another research group uploaded a study to the preprint server bioRxiv, which revealed that journal reviewers could only identify AI-generated study abstracts — the summary paragraphs found at the start of a scientific paper — around 68% of the time, while computer programs could identify the fakes 99% of the time. The reviewers also misidentified 14% of the real papers as fakes. The human reviewers would almost certainly be better at identifying entire papers compared with a single paragraph, the study researchers wrote, but it still highlights that human errors could enable some AI-generated content to go unnoticed. (This study has not yet been peer-reviewed.)

The researchers of the new study say they are pleased that their program is effective at weeding out fake papers but warn it is only a proof of concept. Much more wide-scale studies are needed to create robust models that are even more reliable and can be trained to specific scientific disciplines to maintain the integrity of the scientific method, they wrote (themselves) in their paper.

TOPICS

Harry is a U.K.-based senior staff writer at Live Science. He studied marine biology at the University of Exeter before training to become a journalist. He covers a wide range of topics including space exploration, planetary science, space weather, climate change, animal behavior and paleontology. His recent work on the solar maximum won "best space submission" at the 2024 Aerospace Media Awards and was shortlisted in the "top scoop" category at the NCTJ Awards for Excellence in 2023. He also writes Live Science's weekly Earth from space series.

More about artificial intelligence

GPT-4.5 is the first AI model to pass an authentic Turing test, scientists say

AI creates better and funnier memes than people, study shows — even when people use AI for help

Scientific consensus shows race is a human invention, not biological reality

See more latest