Technology & Artificial Intelligence · IIMV Lecture Notes
Can Machines Think?
The Turing Test in the Age of GPT & Claude
From a 1950 thought experiment to a test that modern AI now routinely passes — and what that actually means
In 1950, a British mathematician sitting in a war-devastated world asked one of the most provocative questions in intellectual history: Can machines think? Alan Turing — the same man who cracked Nazi codes at Bletchley Park — knew the question was philosophically unanswerable. So he replaced it with something more practical: a game. Seventy-five years later, that game has been beaten. The question now is what on earth we do next.
The Imitation Game — Turing’s Original Idea
In his landmark 1950 paper, Computing Machinery and Intelligence, published in the journal Mind, Alan Turing proposed what he called the Imitation Game. The setup was simple, almost playful — inspired by a parlour game common at English country parties.
A human interrogator sits in one room. In two other rooms sit a human foil and a machine. The interrogator can type questions to both and receive typed answers. After the exchange, they must guess which is the machine. If the machine fools the interrogator enough times — Turing predicted this would be achievable by the year 2000 — it could be said to “think.”
“I believe that in about 50 years’ time it will be possible to programme computers to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.”
— Alan Turing, Computing Machinery and Intelligence, 1950Crucially, Turing was not asking whether machines could be conscious. He was asking whether they could behave as if they were. This subtle but important distinction has powered seven decades of philosophical debate — and, more recently, a wave of landmark empirical results.
How the Test Is Actually Run
The modern standard Turing Test follows a three-party structure. An interrogator converses simultaneously — in text — with one human and one AI. After a fixed window (usually five minutes), the interrogator identifies the human. If the AI fools a majority of judges, it is deemed to have “passed.”
Interrogator / Judge
Entity (Human or AI?)
Interrogator / Judge
Entity (Human or AI?)
Interrogator / Judge
Entity (Human or AI?)
Notice what kinds of questions judges actually reach for. Research from the UC San Diego 2025 study found that over 60% of questions focused on daily life, personal anecdotes, and emotional texture — not factual knowledge. Judges intuitively sense that feelings, embarrassment, and narrative imprecision are harder for machines to fake than trivia answers.
Classic Turing Test Question Categories
- Personal memory — “What’s a moment you’ve never told anyone about?”
- Linguistic ambiguity — garden-path sentences, puns, and paradoxes
- Emotional cues — “How do you feel when you think about your childhood?”
- Philosophical introspection — “What do you fear most about dying?”
- Spontaneity tests — “Say something completely random right now.”
- Meta-awareness — “Which of my questions was hardest to answer, and why?”
- Cultural fluency — internet slang, generational references, pop culture
A Brief History — From ELIZA to GPT-4.5
Alan Turing publishes the Imitation Game in Mind. He predicts machines will pass by 2000.
ELIZA, Joseph Weizenbaum’s MIT chatbot, shocks researchers when users form emotional attachments to it — despite it being merely a pattern-matcher.
PARRY, which simulated a paranoid schizophrenic, fooled psychiatrists who could distinguish it from real patients only 52% of the time — statistically, chance.
The Loebner Prize is established — the first annual Turing Test competition with real prizes. A bronze medal for “most human” AI has been awarded every year; gold and silver (for truly passing) were never won.
Eugene Goostman — a chatbot posing as a 13-year-old Ukrainian boy — fools 33% of judges, controversially called a “pass” by organizers. Critics noted the child persona let it off the hook for errors.
Google Duplex calls a hair salon and books an appointment. The receptionist has no idea they’re speaking to an AI. Many consider this a real-world Turing Test pass — in voice.
UC San Diego study: GPT-4 fools 54% of judges in a two-player formulation. Human foils fool judges 67% of the time. By strict definition, GPT-4 passed.
GPT-4.5, with a human-like persona prompt, is judged to be human 73% of the time — more often than actual humans. The Turing Test is, for most practical purposes, solved.
The 2025 UC San Diego Study — What Actually Happened
The most rigorous recent test comes from researchers at UC San Diego’s Language and Cognition Lab. Nearly 300 participants took part across eight rounds of online tests. Each round used the original three-party structure: a human interrogator chatted simultaneously with a real human and an AI, using a split-screen text interface. After five minutes, the interrogator guessed who was human.
Six AI systems were tested: GPT-4.5, GPT-4o, Claude (Sonnet 3.7), Gemini, Mistral, and the ancient ELIZA. Two conditions were run: a bare “no-persona” prompt and a “persona” prompt, where the AI was given a detailed human character — a shy, internet-savvy young person with specific cultural knowledge and casual speech patterns.
(with persona)
win rate
(no persona)
GPT-4.5 was judged to be human significantly more often than actual humans in the same study. The persona prompt made the AI messier, less precise, more culturally fluent — and therefore more convincing. Curiously, the original ELIZA chatbot — a 1960s pattern-matcher — scored 23%, marginally beating GPT-4o without a persona. The lesson: raw intelligence is not enough. Personality and strategic imperfection matter enormously.
How the Big Models Compare
| Model | Organisation | Turing Test Performance | Key Trait | Verdict |
|---|---|---|---|---|
| GPT-4.5 | OpenAI | 73% (with persona) | Emotional warmth, creative writing | Passed |
| LLaMA 3.1-405B | Meta | 56% (with persona) | Open-source power | Passed |
| GPT-4 / GPT-4o | OpenAI | 54% / 21% | Reasoning, multimodal | Marginal |
| Claude 3.7 Sonnet | Anthropic | Competitive with GPT-4.5 class | Long-context reasoning, nuance | Competitive |
| Gemini 2.5 Pro | Google DeepMind | Comparable benchmark tier | Memory, multimodal | Competitive |
| ELIZA (1966) | MIT | 22–23% | Pattern matching | Failed |
What drove GPT-4.5’s outperformance? Researchers attribute it to the model’s unusual “warmth.” Where GPT-4o tends towards precision and structure, GPT-4.5 writes with a kind of casual imprecision that reads as human. Judges were looking for hedging, emotional messiness — and they found it.
What the Test Actually Measures — And What It Doesn’t
Passing the Turing Test does not mean a machine is conscious. It does not mean it understands. It means it has become an extraordinarily sophisticated mimic of human conversational output.
The most important challenge to the Turing Test is John Searle’s Chinese Room argument (1980). Imagine a person locked in a room with a giant rulebook for manipulating Chinese symbols. Chinese speakers slide messages under the door; the person follows the rulebook and slides back plausible responses — without understanding a single word of Chinese. From outside, it looks like the room “speaks” Chinese. But inside, there is no understanding whatsoever.
Searle argues that modern AI is exactly this room — manipulating symbols at enormous scale and sophistication, but with no comprehension behind the output. GPT-4.5 passing the Turing Test proves it can produce human-seeming outputs. It does not prove there is anyone home.
“It was not meant as a literal test that you would actually run on the machine — it was more like a thought experiment. LLMs are master conversationalists, trained on unfathomably vast sums of human-composed text.”
— François Chollet, Google, speaking to Nature (2023)Many researchers go further: they argue the test was always the wrong benchmark. Modern AI has beaten chess grandmasters, written legal briefs, and solved protein-folding problems that stumped biology for decades. None of that required sounding like a nervous teenager on a Tuesday afternoon — which is, more or less, what the Turing Test rewards.
Variations That Have Emerged
Precisely because the original test has limits, researchers have developed alternatives and extensions:
Notable Turing Test Variants
- Total Turing Test — adds visual and motor skills; the machine must also perceive and manipulate objects, not just converse
- Reverse Turing Test (CAPTCHA) — a machine tries to determine if it’s talking to a human. You encounter this every time you click “I am not a robot”
- The Marcus Test — can an AI watch a TV episode and answer meaningful questions about it? Tests true comprehension, not just language
- The Lovelace Test 2.0 — can an AI create genuine art that its designers cannot explain? Tests creativity, not conversation
- ARC-AGI Benchmark — tests abstract reasoning on novel problems, specifically designed to resist AI pattern-matching
- Minimum Intelligent Signal Test — strips conversation down to yes/no answers, removing linguistic fluency as a confound
The broader trend is clear: as AI passes one benchmark, the community invents harder ones. The goalposts have moved from “can it converse?” to “can it reason abstractly?”, “can it sustain a working relationship over months?”, and eventually, “can it be trusted with consequential decisions?”
Beyond the Turing Test — What Comes Next
The 2025 moment where GPT-4.5 was identified as human more often than actual humans felt like a watershed — but also, to many, anticlimactic. Of course a model trained on trillions of human words sounds human. The real question is whether that linguistic mimicry translates into anything deeper.
The real tests now ask whether AI can sustain hours of complex reasoning, handle multimodal inputs (voice, images, video), and — most critically — whether it can produce safe, accurate, and useful outcomes rather than merely convincing ones. The Turing Test has been lapped. The new question is whether AI can be genuinely useful and trustworthy — which requires entirely different evaluations.
“Today, models like ChatGPT, Claude, Gemini, and Grok have already done it. But the real game starts now. Beyond clever banter, can AI sustain hours of reasoning? Beyond sounding human, can AI deliver safe, accurate, and useful outcomes?”
— Turing Institute, 2025So — What Should We Make of All This?
The Turing Test was never really about machines. It was about us — about what we mean by intelligence, thinking, and understanding. Turing asked his question in a world where the very idea of a calculating machine was exotic. He picked conversation as his benchmark because conversation is the most distinctly human thing he could imagine.
We now live in a world where machines converse better than many humans, in some settings. That is genuinely remarkable. But Turing’s deeper question — can machines think? — remains as unanswered as ever. Passing a conversational test tells us about linguistic fluency. It tells us nothing about whether there is experience on the other side of the screen.
For students of management and strategy, there is a crisper takeaway: the Turing Test has moved from benchmark to table stakes. The organisations winning with AI in 2025–2026 are not asking whether their systems sound human. They are asking whether their systems reason reliably, make good decisions under uncertainty, and can be held accountable when they are wrong. That is the next test. And no one has passed it yet.
This post was inspired by a classroom discussion at IIM Visakhapatnam. If it sparked a thought, share it with someone navigating the same questions in their organisation.
What do you think — has the Turing Test passing changed how you think about AI in your work? Drop a comment below.
Key Sources
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460.
Jones, C. et al. (2025). People cannot distinguish GPT-4 from a human in a Turing test. UC San Diego preprint.
Chollet, F. (2023). Quoted in Nature. Science.org citation (adq9356).
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424.
Turing Institute (2025). Passing the Turing Test: What’s next? turing.com/blog