AI Alignment
A
AI Alignment
Definition
The challenge of ensuring that AI systems pursue goals and behaviors that are consistent with human intentions and values. Alignment research seeks to solve problems like reward hacking, goal misspecification, and deceptive alignment.