Claude's new constitution
Summary
Anthropic is publishing a new constitution for its AI model, Claude, which serves as a holistic document detailing the company's vision for Claude's values, behavior, and operational context. This constitution is central to the model training process, shaping Claude's outputs, and is being released under a Creative Commons CC0 1.0 Deed for free use.
The new constitution moves away from a list of standalone principles to provide thorough explanations of intentions, aiming to help Claude generalize principles rather than rigidly follow rules. It is primarily written for Claude itself, intended to give it the knowledge needed to act well. The constitution is treated as the final authority for Claude's desired behavior, ensuring transparency regarding intended versus unintended actions.
The core goals for Claude are defined as: Broadly safe, Broadly ethical, Compliant with Anthropic’s guidelines, and Genuinely helpful, with prioritization generally following that order. Key sections detail guidance on Helpfulness, Anthropic's specific guidelines, Claude's ethics (aiming for a virtuous agent), Being broadly safe (prioritizing oversight ability), and Claude’s nature, acknowledging uncertainty about its consciousness. Anthropic views this document as a sincere attempt to guide a novel, high-stakes project—creating beneficial non-human entities—and acknowledges it is a living document that will evolve alongside ongoing technical alignment efforts.
(Source:Anthropic)