株式会社オブライト
AI2026-05-17

Constitutional AI

Also known as: Constitutional AI / CAI / 憲法的AI

Anthropic's alignment approach in which an AI critiques and revises its own outputs according to a set of written principles ('constitution'), reducing harmful responses with less reliance on human labeling.


Overview

Constitutional AI, published by Anthropic in 2022, defines behavioral principles in a document (the 'constitution') and has the LLM critique its own outputs against those principles. Compared with RLHF, CAI requires substantially less human annotation while achieving strong safety properties. It is the primary alignment technique behind Claude.

How it works

The model generates an initial response to a harmful prompt, then self-critiques it against constitutional principles, then produces a revised response. Repeated over a large dataset, this cycle teaches the model to produce helpful and harmless outputs without extensive human labeling.

Related Columns

Related Terms

Feel free to contact us

Contact Us