Constitutional AI
Also known as: Constitutional AI / CAI / 憲法的AI
Anthropic's alignment approach in which an AI critiques and revises its own outputs according to a set of written principles ('constitution'), reducing harmful responses with less reliance on human labeling.
Overview
Constitutional AI, published by Anthropic in 2022, defines behavioral principles in a document (the 'constitution') and has the LLM critique its own outputs against those principles. Compared with RLHF, CAI requires substantially less human annotation while achieving strong safety properties. It is the primary alignment technique behind Claude.
How it works
The model generates an initial response to a harmful prompt, then self-critiques it against constitutional principles, then produces a revised response. Repeated over a large dataset, this cycle teaches the model to produce helpful and harmless outputs without extensive human labeling.
Related Columns
Related Terms
Feel free to contact us
Contact Us