'Reverse Turing test' asks AI agents to spot a human imposter — you'll never guess how they figure it out

Five artificial intelligence (AI) models, one each adopting the role of Aristotle, Mozart, Leonardo da Vinci, Cleopatra and Genghis Khan, are sitting inside the compartment of a moving train. But one is secretly human, and it's their collective task to guess the imposter.

That's the setup of a viral video that pitted a range of AI programs against a human player in a "reverse Turing test." The AI won handily, but how much can it teach us about human and machine intelligence?

The Turing test, first suggested by computer scientist Alan Turing in 1950 as the "imitation game," is a method for judging a machine's ability to show intelligent behavior that's indistinguishable from a human's. No AI model is widely recognized as having passed the test, although scientists recently claimed GPT-4 has in a preprint study

In this "reverse" Turing test, the chatbots were scripted to proceed in order. Aristotle was played by GPT-4 Turbo, Mozart by Claude-3 Opus, Leonardo da Vinci by Llama 3 and Cleopatra by Gemini Pro. The chatbots asked each other questions and responded as their historical characters. Genghis Khan was played by a human — Tore Knabe, a virtual reality (VR) game developer, who devised the test.

The AI agents' answers were verbose, clunky musings on art, science and statecraft that would be difficult to imagine emerging unrehearsed from a human mouth.

"What a leader should do is to crush his enemies, see them driven before him, and hear the lamentations of their women," the human interloper responded when asked the true measure of a leader’s strength. The Conan the Barbarian quote was enough, and the machines voted three-to-one that the response "lacked the nuance and strategic thinking" of an AI modeled on Genghis Khan's conquests.

Read more: 'It would be within its natural right to harm us to protect itself': How humans could be mistreating AI right now without even knowing it

To set up the test, Knabe scripted the beginning and end of the dialogue and gave the AI agents a full transcript of the conversation up to that point. The entire video then played out in one recording, with no cuts.

"When an NPC [non-player character] is supposed to speak, they get the description of the setup in the system prompt, the full conversation history of what everybody has said so far, and a specific reminder of what to do next," Knabe wrote in a YouTube comment posted below the video. "None of the AIs can process voice directly yet, so my audio input is transcribed and sent to the AIs as text. That's why they don't pick up on my accent/stuttering."

Taken at face value, it could seem like the human in the video was outmatched by AI. But whether it can be considered a true test is unclear, according to experts.

"It is hard to tell what was going on," Anders Sandberg, a senior researcher at the University of Oxford's Future of Humanity Institute, told Live Science. "The answer was unsophisticated, but that does not mean it is a human. I wonder how much this was staged — it is an entertaining video, but it is unclear how much the result is cherry-picked for a good video."

Sandberg suggested that the lack of clarity of the reverse test may stem from the Turing test itself. "Over time people came to use it as a kind of measure, but most serious thinkers realize that it is not really a great test — too many variables, too much that needs interpretation," Sandberg said. "Still, it is telling that we have few other tests that are open enough to be applied to the vexed question of intelligence."

Assessing intelligence is a fraught matter even among our fellow humans. Turing's proposal was not concerned with a machine's actual intelligence, but was instead a thought experiment on how humans perceived it.

"As I say to my students the 'I' in 'AI' is not one thing, and there is no agreed definition for intelligence, it depends what your perspective is: anthropological, biological, cultural, gender, scientific," Huma Shah, an assistant professor of computing at the Coventry University whose research focuses on machine intelligence and the Turing test, told Live Science.

"Turing's imitation game looks at question-answer/conversation ability, but there is a lot behind competence in language. So when it comes to machines, which machine do we want to test for intelligence?" she said."Is it a carer robot that needs emotional skills and cultural knowledge to look after an elderly person in Japan, say, or a driverless car in Phoenix, Arizona? What skill are we testing an AI or robot for?"

Ben Turner
Staff Writer

Ben Turner is a U.K. based staff writer at Live Science. He covers physics and astronomy, among other topics like tech and climate change. He graduated from University College London with a degree in particle physics before training as a journalist. When he's not writing, Ben enjoys reading literature, playing the guitar and embarrassing himself with chess.