OpenAI AI Safety

OpenAI Outlines “Preparedness Framework” to Systematically Track and Mitigate AI Safety Risks

December 18, 2023 • 2 min read

As AI systems grow more advanced, concerns around safety and misuse continue to mount. In an effort to proactively address these issues, OpenAI has released a “Preparedness Framework”—currently in beta—that outlines processes to continually evaluate risks from the company’s cutting-edge AI models. The goal is to allow ongoing innovation while safeguarding against potential harms.

OpenAI has various internal safety groups examining both current and future risks. For example, the Safety Systems team focuses on current model misuse, while the Superalignment Team explores safety measures for future superintelligent models.

The newly announced Preparedness team focuses specifically on evaluating dangers from AI progress on the frontier. By partnering across groups, their aim is to ground safety efforts in rigorous science while anticipating what lies ahead.

Scorecards with Thresholds for Action – The framework introduces a system of risk "scorecards." These scorecards evaluate frontier models at every significant computational milestone. The objective is to push these models to their limits, identifying and addressing specific safety concerns. These evaluations are critical in determining which models are safe for further development and deployment. OpenAI pledges to continually probe its models’ capabilities and use learnings to produce updated risk “scorecards.” Predefined thresholds dictate what added precautions are necessary based on scores in areas like cybersecurity, toxic content generation, and model autonomy.

Proactive Safety Testing and Outside Audits – The company also plans to stress-test its own systems and culture through regular drills simulating safety incidents and rapid response scenarios. Third-party auditors will continue to be engaged to evaluate models and provide red-teaming support.

Extending Beyond Known Risks – The framework also emphasizes collaboration with external entities and internal teams. These collaborations aim to track real-world misuse and emergent misalignment risks. Furthermore, OpenAI will continue investing in research to understand how risks evolve as models scale, drawing on its experience with scaling laws.

Oversight for Responsible Innovation – Another pivotal element of the Preparedness Framework is the establishment of a dedicated oversight team. This team is charged with the technical evaluation of frontier models and the synthesis of comprehensive safety reports. These reports inform decision-making processes at both the leadership and board levels, ensuring a balanced approach to model development and deployment.

Interestingly, the Board of Directors holds the right to reverse decisions made by CEO Sam Altman and the leadership team. This veto power reflects recent upheaval amongst OpenAI’s leadership ranks. Last month, the former board briefly removed Altman from his CEO role, citing issues around his transparency and aggressive innovation pace, before reinstating him days later amid staff uproar. The board currently consists of Bret Taylor, Larry Summers, and Adam D'Angelo.