AI chatbot ChatGPT can't create convincing scientific papers… yet
A computer model created by researchers can detect ChatGPT-generated fake studies more than 99% of the time, a new study shows.
The artificial intelligence (AI) chatbot ChatGPT may be a decent mimic of human workers in several fields, but scientific research is not one of them, according to a new study that used a computer program to spot fake studies generated by the chatbot. But the AI is still capable of fooling some humans with its science writing, previous research shows.
Since bursting onto the scene in November 2022, ChatGPT has become a hugely popular tool for writing reports, sending emails, filling in documents, translating languages and writing computer code. But the chatbot has also been criticized for plagiarism and its lack of accuracy, while also sparking fears that it could help spread "fake news" and replace some human workers.
In the new study, published June 7 in the journal Cell Reports Physical Science, researchers created a new computer learning program to tell the difference between real scientific papers and fake examples written by ChatGPT. The scientists trained the program to identify key differences between 64 real studies published in the journal Science and 128 papers created by ChatGPT using the same 64 papers as a prompt.
The team then tested how well their model could differentiate between a different subset of real and ChatGPT-generated papers, which included 60 real papers from the journal Science and 120 AI-generated counterfeits. The program flagged the AI-written papers more than 99% of the time and could correctly tell the difference between human-written and chatbot-written paragraphs 92% of the time.
Related: AI's 'unsettling' rollout is exposing its flaws. How concerned should we be?
ChatGPT-generated papers differed from human text in four key ways: paragraph complexity, sentence-level diversity in length, punctuation marks and "popular words." For example, human authors write longer and more complex paragraphs, while the AI papers used punctuation that is not found in real papers, such as exclamation marks.
The researchers' program also spotted lots of glaring factual errors in the AI papers.
Sign up for the Live Science daily newsletter now
Get the world’s most fascinating discoveries delivered straight to your inbox.
"One of the biggest problems is that it [ChatGPT] assembles text from many sources and there isn't any kind of accuracy check," study lead author Heather Desaire, an analytical chemist at the University of Kansas, said in the statement. As a result, reading through ChatGPT-generated writing can be like "playing a game of two truths and a lie," she added.
Creating computer programs to differentiate between real and AI-generated papers is important because previous studies have hinted that humans may not be as good at spotting the differences.
In December 2022, another research group uploaded a study to the preprint server bioRxiv, which revealed that journal reviewers could only identify AI-generated study abstracts — the summary paragraphs found at the start of a scientific paper — around 68% of the time, while computer programs could identify the fakes 99% of the time. The reviewers also misidentified 14% of the real papers as fakes. The human reviewers would almost certainly be better at identifying entire papers compared with a single paragraph, the study researchers wrote, but it still highlights that human errors could enable some AI-generated content to go unnoticed. (This study has not yet been peer-reviewed.)
The researchers of the new study say they are pleased that their program is effective at weeding out fake papers but warn it is only a proof of concept. Much more wide-scale studies are needed to create robust models that are even more reliable and can be trained to specific scientific disciplines to maintain the integrity of the scientific method, they wrote (themselves) in their paper.
Harry is a U.K.-based senior staff writer at Live Science. He studied marine biology at the University of Exeter before training to become a journalist. He covers a wide range of topics including space exploration, planetary science, space weather, climate change, animal behavior, evolution and paleontology. His feature on the upcoming solar maximum was shortlisted in the "top scoop" category at the National Council for the Training of Journalists (NCTJ) Awards for Excellence in 2023.