'Reverse Turing test' asks AI agents to spot a human imposter — you'll never guess how they figure it out

YouTube

Five artificial intelligence (AI) models, one each adopting the role of Aristotle, Mozart, Leonardo da Vinci, Cleopatra and Genghis Khan, are sitting inside the compartment of a moving train. But one is secretly human, and it's their collective task to guess the imposter.

That's the setup of a viral video that pitted a range of AI programs against a human player in a "reverse Turing test." The AI won handily, but how much can it teach us about human and machine intelligence?

The Turing test, first suggested by computer scientist Alan Turing in 1950 as the "imitation game," is a method for judging a machine's ability to show intelligent behavior that's indistinguishable from a human's. No AI model is widely recognized as having passed the test, although scientists recently claimed GPT-4 has in a preprint study.

In this "reverse" Turing test, the chatbots were scripted to proceed in order. Aristotle was played by GPT-4 Turbo, Mozart by Claude-3 Opus, Leonardo da Vinci by Llama 3 and Cleopatra by Gemini Pro. The chatbots asked each other questions and responded as their historical characters. Genghis Khan was played by a human — Tore Knabe, a virtual reality (VR) game developer, who devised the test.

The AI agents' answers were verbose, clunky musings on art, science and statecraft that would be difficult to imagine emerging unrehearsed from a human mouth.

"What a leader should do is to crush his enemies, see them driven before him, and hear the lamentations of their women," the human interloper responded when asked the true measure of a leader’s strength. The Conan the Barbarian quote was enough, and the machines voted three-to-one that the response "lacked the nuance and strategic thinking" of an AI modeled on Genghis Khan's conquests.

To set up the test, Knabe scripted the beginning and end of the dialogue and gave the AI agents a full transcript of the conversation up to that point. The entire video then played out in one recording, with no cuts.

"When an NPC [non-player character] is supposed to speak, they get the description of the setup in the system prompt, the full conversation history of what everybody has said so far, and a specific reminder of what to do next," Knabe wrote in a YouTube comment posted below the video. "None of the AIs can process voice directly yet, so my audio input is transcribed and sent to the AIs as text. That's why they don't pick up on my accent/stuttering."

Taken at face value, it could seem like the human in the video was outmatched by AI. But whether it can be considered a true test is unclear, according to experts.

"It is hard to tell what was going on," Anders Sandberg, a senior researcher at the University of Oxford's Future of Humanity Institute, told Live Science. "The answer was unsophisticated, but that does not mean it is a human. I wonder how much this was staged — it is an entertaining video, but it is unclear how much the result is cherry-picked for a good video."

Sandberg suggested that the lack of clarity of the reverse test may stem from the Turing test itself. "Over time people came to use it as a kind of measure, but most serious thinkers realize that it is not really a great test — too many variables, too much that needs interpretation," Sandberg said. "Still, it is telling that we have few other tests that are open enough to be applied to the vexed question of intelligence."

Sign up for the Live Science daily newsletter now