Safety Pretraining: Toward the Next Generation of Safe AI
Authors:
Pratyush Maini1,2*,
Sachin Goyal1*,
Dylan Sam1*,
Alex Robey1,4,
Yash Savani1,
Yiding Jiang1,
Andy Zou1,3,4,
Zachary C. Lipton1,
J. Zico Kolter1
1Carnegie Mellon University
2DatologyAI
3Center for AI Safety
4Gray Swan AI
* Equal contribution
TL;DR – We embed safety directly into the pretraining pipeline with data‑centric interventions, delivering SafeLM, a 1.7B model family that is natively safe before any RLHF.
Everything (code, data & weights) is open‑source.
📄 Read the Paper
🔗 HuggingFace Hub