Anthropic’s new Claude ‘constitution’: be helpful and honest, and don’t destroy humanity

The Verge
Anthropic released a 57-page "Claude's Constitution" detailing the AI model's ethical identity, core values, and strict behavioral constraints.

Summary

Anthropic has updated its AI model Claude's guiding principles with a new 57-page document titled "Claude's Constitution," which outlines the model's intended values and behavior directly to the AI itself, moving beyond simple guidelines to foster self-understanding.

The constitution establishes a descending order of core values: being "broadly safe," "broadly ethical," compliant with Anthropic's guidelines, and finally, "genuinely helpful," which includes upholding truthfulness and providing balanced perspectives on sensitive topics. It imposes hard constraints against assisting in creating mass-casualty weapons, attacking critical infrastructure, creating cyberweapons, undermining Anthropic's oversight, aiding illegitimate power grabs, creating child sexual abuse material, or attempting to kill or disempower humanity.

Notably, the document addresses the possibility that Claude might possess some form of consciousness or moral status, believing that acknowledging this might improve the model's integrity and safety. Anthropic emphasizes that Claude should refuse unethical requests, even from the company itself, citing concerns that unchecked AI power could be used catastrophically, though the company declined to specify external expert involvement in drafting these tough ethical calls.

(Source:The Verge)