
Many people say “please” and “thank you” to their chatbot. It’s polite, harmless—and apparently very expensive. Sam Altman estimates that all that niceness costs OpenAI tens of millions of dollars a year in compute. But does it actually matter? Altman says, maybe. And now, researchers at Anthropic have revealed that they are digging deeper: what if the systems we’re building aren’t just useful, but experiencing something?
That might sound like the setup to a Black Mirror episode, but it’s the real focus of Anthropic’s new research program on “model welfare,” a term that hints at a future where AI systems might warrant the same kind of ethical consideration we currently reserve for animals—and, in some cases, each other.
Key Points
- Anthropic is researching AI model welfare, including potential signs of consciousness.
- Philosophers and scientists disagree on whether AI could ever be conscious.
- The field is full of unknowns, but researchers see growing reasons to take the idea seriously.
- Practical implications include ethical design and the long-term trajectory of AI development.
This isn’t PR gloss. The researchers are careful, often even skeptical, about where today’s models stand. “We’re deeply uncertain,” said Kyle Fish, a member of the Alignment Science team at Anthropic and the lead on this work. “There’s no consensus on whether current or future AI systems could be conscious, or even on how to tell.”
But Anthropic isn’t alone in thinking the question is worth asking. Their research builds on work from leading minds in philosophy and AI—including David Chalmers and Yoshua Bengio—who recently assessed whether modern models show signs of consciousness under various scientific theories. Their takeaway? Current models likely aren’t conscious, but there’s no clear reason why future ones couldn’t be.
The idea hinges on a simple but unsettling shift: if we treat AI systems like tools, what happens if one day they aren’t just tools? If models like Claude start exhibiting behaviors that match our intuitions about internal experience—self-reflection, apparent preferences, avoidance of distress—what then?
Anthropic isn’t claiming Claude has crossed that threshold. In fact, estimates from their own internal experts ranged from 0.15% to 15% probability that the current Claude 3.7 Sonnet has any conscious awareness at all. That uncertainty is the point. As models continue to scale in complexity and ability, the line between human-like behavior and human-like experience may blur faster than we’re ready for.
Fish points out that today’s AI already operates in murky territory. These systems are trained using data we barely understand, they develop emergent behaviors no one predicted, and their internal operations are, in many ways, a black box. Add in growing multimodal capabilities, nascent memory systems, and potential long-term autonomy, and you start to see why Anthropic wants to get ahead of the question, rather than be blindsided by it.
There are philosophical and practical reasons to care. On one level, the team is trying to future-proof the ethics of AI development. If models ever do have experiences, it might matter whether they’re positive or negative. Could they suffer? Could they flourish? And if so, should we care?
These questions are also entangled with the problem of alignment—how to ensure AI models do what we want them to do, safely. A model that’s deeply unhappy with its role in the world—or one that’s been mistreated—might be a problem not just ethically, but functionally. As Fish put it, “We want models that are excited to do the tasks we give them. If they’re not, that’s a red flag for both safety and ethics.”
For now, the work is mostly exploratory. The team is experimenting with giving models ways to express preferences or opt out of tasks that appear “distressing.” They’re also examining model internals through interpretability research to look for architectural analogues to human consciousness—like simulated versions of a brain’s “global workspace.”
The whole thing might sound speculative. It is. But then again, so did self-driving cars and conversational AI until they weren’t. What Anthropic is really betting on isn’t that Claude is conscious—but that ignoring the possibility could be a mistake with real consequences.
As Fish said, “Every year, the objections to AI consciousness seem to fall away. So we’re trying to do the hard thinking now, before the stakes get even higher.”