Phase 112: AI Safety, Alignment and Robustness
Phase 112 of the AI Encyclopedia — AI Safety, Alignment and Robustness. Topics 2221–2240.
This phase covers AI Safety, Alignment and Robustness. Below are the 20 concepts grouped under this phase — each is a future article in the Insightful AI World encyclopedia.
2221 AI Safety
2222 AI Alignment
2223 Outer Alignment
2224 Inner Alignment
2225 Reward Hacking
2226 Specification Gaming
2227 Goal Misgeneralization
2228 Scalable Oversight
2229 Debate
2230 Constitutional AI
2231 Robustness
2232 Distribution Shift
2233 Out-of-distribution Detection
2234 Uncertainty Estimation
2235 Calibration
2236 Red Teaming
2237 Jailbreak Resistance
2238 Safety Evaluation
2239 Frontier Model Risk
2240 AI Risk Mitigation