Anthropic Explores Democratizing AI Alignment Using Public Input

Working with the Collective Intelligence Project, Anthropic used the perspectives from 1,000 Americans to define the principles that should govern AI behavior.

Anthropic Explores Democratizing AI Alignment Using Public Input
Image Credit: Anthropic

Artificial intelligence will increasingly transform our society, and it raises critical questions around what values and principles should be directing this powerful technology. Who should decide the core priorities that shape AI systems? What should those priorities be? AI research lab Anthropic and non-profit Collective Intelligence Project (CIP) recently collaborated to explore how direct public input could help democratize AI's alignment with human values.

Anthropic’s Constitutional AI (CAI) technique trains large language models to follow certain high-level normative principles and values. This is different from other alignment methods like Reinforcement Learning From Human Feedback (RLHF) and Reinforcement Learning From AI Feedback (RLAIF). The constitution lays out norms, ethics and intended behaviors that are encoded into the model through the Constitutional AI training process.

CAI was used to align Anthropic's popular conversational AI assistant, Claude. To develop it, Anthropic says it drew on sources like the UN Declaration of Human Rights as well as their own experiences with language models. Despite the positive results, Anthropic and CIP want to explore how public participation could steer AI towards greater inclusivity.

In a new research experiment, the companies used an online platform to gather perspectives from 1,000 Americans on what principles should govern AI behavior. They then distilled this into a “public constitution” for training a chatbot and compared it to a baseline chatbot trained solely on Anthropic’s internal constitution.

The results showed overlaps, but also key differences between the public and original constitutions. The public model exhibited lower social bias across metrics like race, gender and disabilities. Both models performed similarly on technical benchmarks regarding language understanding, mathematical proficiency, and overall helpfulness.

Hop on over to the Anthropic blog to check out the full details of the experiment. It is well worth diving into their methodology, challenges, and results. This research provides an early demonstration of how public participation could steer AI in a more inclusive direction. It is an imperfect but promising start to ensuring AI reflects broad priorities, not just those of the model developers.

It also raises important questions.

Firstly, should the public actually have a role in steering AI? What are the merits and risks of broad participation? Could crowdsourcing lapse into unproductive polarization? Is expertise required for ethical AI design? Do technologists have blindspots that diverse oversight could address? How do we balance expert guidance and collective input into AI governance?

And if public involvement is deemed beneficial, who constitutes “the public”? Should locally affected communities have priority? Does all of humanity deserve a say regarding technologies like AI with global impacts? How can we even enable meaningful worldwide participation? Can processes confined to countries or regions achieve legitimacy?

Even defining “the public” within countries poses challenges. Well-intentioned efforts may inadvertently skew towards certain demographics like younger, educated, left-leaning groups. How do we ensure inclusion of diverse ages, races, geographies, ideologies? Are some perspectives more affected and thus deserving greater representation?

Assuming we can determine who should participate, how do we empower their involvement? Relying on tech companies and researchers seems problematic. Government oversight raises concerns around politicization. Could independent, internationally representative organizations play a role? But how would such groups earn legitimacy?

With AI, our society stands at the intersection of unparalleled technological advancement and a renewed focus on the essence of human values. The promising results from the Anthropic and CIP experiment show that the involvement of the public has tangible benefits in shaping AI. However, this involvement, while immensely valuable, brings with it a set of challenges that can't be overlooked.

It is a call to action for the tech community, policymakers, and the public at large to come together, engage in meaningful dialogues, and jointly steer the trajectory of AI. The reward - AI representing the best of our shared humanity - is certainly worth pursuing.

Let’s stay in touch. Get the latest AI news from Maginative in your inbox.