Phase 112: AI Safety, Alignment and Robustness

Phase 112 of the AI Encyclopedia — AI Safety, Alignment and Robustness. Topics 2221–2240.

Part of the AI Encyclopedia · Phase 112 of 130 · Topics 2221–2240

This phase covers AI Safety, Alignment and Robustness. Below are the 20 concepts grouped under this phase — each is a future article in the Insightful AI World encyclopedia.

2221 AI Safety

2222 AI Alignment

2223 Outer Alignment

2224 Inner Alignment

2225 Reward Hacking

2226 Specification Gaming

2227 Goal Misgeneralization

2228 Scalable Oversight

2229 Debate

2230 Constitutional AI

2231 Robustness

2232 Distribution Shift

2233 Out-of-distribution Detection

2234 Uncertainty Estimation

2235 Calibration

2236 Red Teaming

2237 Jailbreak Resistance

2238 Safety Evaluation

2239 Frontier Model Risk

2240 AI Risk Mitigation

← Phase 111

All phases

Phase 113 →