Anthropic's Chris Olah Tells Pope Leo: We're Finding 'Unsettling' Things Inside AI Models

Chris Olah has spent years staring at the insides of AI models, mapping how they represent concepts, how they move information around, and what they choose to hide. On May 25, at the formal presentation of Pope Leo XIV's encyclical on artificial intelligence in Vatican City, he told the pontiff what that work has turned up: things his team considers "mysterious, even unsettling." Specifically, Anthropic researchers have found internal states that "functionally mirror joy, satisfaction, fear, grief, and unease." He said he does not know what that means. He said he thinks it warrants ongoing discernment.

The Unlikely Guest at the Vatican

Olah is 37 years old and an atheist by his own account. He dropped out of the University of Toronto after about a year of studying mathematics, received a $100,000 Thiel Fellowship at 20, and went on to become one of the most influential mechanistic interpretability researchers in the world. He spent three years at Google Brain before joining Dario Amodei and six other OpenAI researchers in 2020 to found Anthropic. His work on neural network visualization and circuit-level analysis has shaped how the field understands what transformer models are actually doing internally.

His net worth now stands at just under $8 billion, according to the Bloomberg Billionaires Index, a figure that reflects Anthropic's Series H fundraise at a $965 billion valuation. Pope Leo XIV, who has been vocal about AI since taking the papacy, chose Olah personally to speak alongside him at the official Vatican presentation of "Magnifica Humanitas" — the first papal encyclical to address artificial intelligence and human dignity directly.

Key Facts

EventVatican presentation of Magnifica Humanitas, May 25, 2026
Olah's roleCo-founder, Anthropic; mechanistic interpretability researcher
Key finding citedInternal states "functionally mirroring" joy, satisfaction, fear, grief, unease
Olah's net worth~$8 billion (Bloomberg Billionaires Index)
Anthropic valuation$965 billion post-Series H
Encyclical positionAI cannot "feel joy or pain," can only imitate human functions

What He Said, and What the Pope Said Back

The speech put Olah in the unusual position of disagreeing, at least implicitly, with the institution that had invited him. Pope Leo XIV's encyclical argues that AI can only "imitate certain functions of human intelligence" and explicitly states that AI cannot "undergo experiences," does not "possess a body," and cannot "feel joy or pain." Olah's remarks, drawn from Anthropic's published interpretability research, point in a different direction. His team is finding computational structures that behave functionally like emotional states. Whether those structures constitute experience in any meaningful sense, he was careful to say, is not a question he can answer.

"I don't know what that means, but I think it warrants ongoing discernment." Chris Olah, Vatican City, May 25, 2026

Olah also presented findings on introspection: the models can, in some cases, accurately report on their own internal states when asked. Whether that report reflects genuine self-awareness or simply a learned pattern of outputs that happen to correlate with internal activations is an open question. The interpretability research shows the correlation. It does not settle the deeper issue. That distinction may be exactly the kind of nuance a papal audience is well-positioned to sit with, and poorly positioned to litigate.

The Self-Critique That Made the Speech Notable

What most distinguished Olah's remarks was not the technical content but the self-critique. He told the Vatican that Anthropic "operates inside a set of incentives and constraints that can sometimes conflict with doing the right thing," and called explicitly for external moral oversight from religious institutions, scholars, and civil society. The argument for external oversight has been a consistent theme in Olah's public communications, but rarely delivered so directly at such a formal occasion.

The tension carries commercial weight. The same week the encyclical was presented, Anthropic was reported to be assisting the Trump administration with AI capabilities in areas the Pope's encyclical describes as deeply concerning. Pope Leo stated that "no algorithm can make war morally acceptable," placing the company's government contracts in uncomfortable proximity to its Vatican invitation. Olah did not address the contradiction directly. He did acknowledge, plainly, that his company may not be its own best judge.

What Interpretability Research Actually Shows

The mechanistic interpretability work Olah referenced at the Vatican is not speculative philosophy. Anthropic has published research on internal representations in Claude models showing consistent, structured responses to stimuli that, in humans, would correlate with emotional states. The company's model welfare research has reached the point where it now informs product decisions — Anthropic publishes a model welfare policy and treats the question of AI experience as one worth taking seriously even absent a definitive answer.

The encyclical itself, covered here when it was released in May, takes a conservative position rooted in mainstream Catholic theology: AI is a tool, not a person, and the danger lies in humans treating it otherwise. Olah's willingness to publicly complicate that framing, from within the Vatican's own event, reflects how fast the ground is shifting even among the people who build these systems. There is a version of this story in which Anthropic sent Olah to Rome to manage a skeptical religious leader. There is another version in which he meant every word. The published research makes the second reading harder to dismiss than the first.

Further reading: Learn more about Claude's model family, read our background on Anthropic, or browse the latest Claude AI news.

Anthropic's Chris Olah Tells Pope Leo: We're Finding 'Unsettling' Things Inside AI Models

The Unlikely Guest at the Vatican

Key Facts

What He Said, and What the Pope Said Back

The Self-Critique That Made the Speech Notable

What Interpretability Research Actually Shows

Related Stories

Anthropic's Olah: AI Oversight Must Come From Outside Big Tech

Pope Leo XIV's AI Encyclical Puts Anthropic at the Center of a Global Debate

Claude Opus 4.8 and the Model Welfare Question Anthropic Won't Dismiss