Definition

A research field focused on ensuring AI systems operate safely, reliably, and in alignment with human values. AI safety encompasses alignment research, robustness testing, adversarial defense, interpretability, and governance frameworks.

Defined Term