AI Safety - Maginative

OpenAI Releases Open-Weight Safety Models That Rewrite Policy Rules on the Fly

Chris McKay• October 29, 2025 • 2 min read

OpenAI released gpt-oss-safeguard, open-weight reasoning models that let platforms define and update safety policies at inference time instead of retraining classifiers—part of a broader push with nonprofit ROOST to democratize content moderation tools.

OpenAI AI Safety

OpenAI Updates Model Spec to Better Balance User Freedom with Safety Guardrails

Chris McKay• February 12, 2025 • 4 min read

The update explicitly embraces intellectual freedom within defined safety boundaries, allowing discussion of controversial topics while maintaining restrictions against concrete harm.

AI Safety Responsible AI OpenAI Google

Google, OpenAI, Roblox, and Discord Form Child Safety Group to Combat Online Child Exploitation in the AI Era

Chris McKay• February 10, 2025 • 2 min read

The initiative seeks to create a more accessible and transparent safety infrastructure in response to growing concerns over AI’s role in online harm.

AI Safety

The ARC Prize will Become a Nonprofit that Benchmarks AGI

Chris McKay• January 8, 2025 • 2 min read

The foundation will transition to a 501(c)(3) nonprofit and begin fundraising this month

OpenAI AI Safety

OpenAI Shares Research on Red Teaming Methods

Chris McKay• November 21, 2024 • 2 min read

OpenAI has been applying these techniques across their major model releases, from DALL-E 2 through their recent o1 model family.

AI Safety AI Ethics

The Character.AI Lawsuit Is a Wake-Up Call for Responsible AI Development

Chris McKay• October 25, 2024 • 4 min read

What happens when these AI-generated relationships become more than just entertaining—when they become emotional crutches for people struggling to connect in the real world?

AI Safety AI Policy

California Governor Vetoes Landmark AI Safety Bill

Chris McKay• September 29, 2024 • 2 min read

In his veto message, Newsom stated that while the bill was "well-intentioned," it failed to consider the deployment context of AI systems.

AI Policy AI Safety Media & Entertainment

California Enacts Five Laws to Combat AI Deepfakes in Elections and Entertainment

Chris McKay• September 17, 2024 • 2 min read

Three bills focus on curbing the spread of deceptive AI-generated content in political campaigns, while two others strengthen protections for actors and performers against unauthorized use of their digital likenesses.

Google AI Safety

Google to Identify AI-Generated Images in Search and Ads with C2PA 2.1 Standard

Chris McKay• September 17, 2024 • 1 min read

In Search, the "About this image" feature will display whether an image was created or edited using AI tools. For Ads, the company aims to use C2PA signals to enforce relevant policies.

OpenAI AI Safety

OpenAI Updates Safety and Security Measures with Independent Oversight

Chris McKay• September 16, 2024 • 3 min read

The newly empowered SSC will have the authority to delay model releases if safety concerns are not adequately addressed.

Microsoft AI Safety

Microsoft Partners with StopNCII to Combat Intimate Image Abuse

Chris McKay• September 5, 2024 • 1 min read

The pilot program has already shown promising results. By the end of August, the company had taken action on 268,899 images flagged through the StopNCII database, preventing them from appearing in Bing image search results.

AI Policy AI Safety

California AI Safety Bill Clears Assembly, Awaits Final Approval

Chris McKay• August 28, 2024 • 2 min read

The Safe and Secure Innovation for Frontier Artificial Intelligence Models Act squeaked through the Assembly with a 41-9 vote, the minimum number required for passage.

OpenAI AI Safety

OpenAI Has Demonstrated "Strawberry" AI Capabilities to U.S. National Security Officials

Chris McKay• August 27, 2024 • 2 min read

Strawberry, previously known as Q*, is reportedly significant technical leap forward in AI capabilities, particularly in complex problem-solving and reasoning.

OpenAI AI Ethics AI Safety

OpenAI Disrupts Iranian Influence Operation That Used ChatGPT

Chris McKay• August 16, 2024 • 2 min read

The operation was using the AI assistant to produce articles and social media posts on subjects ranging from U.S. politics to global events, which were then distributed across multiple platforms.

Anthropic AI Safety

Anthropic will Pay You Up To $15,000 to Jailbreak its Next-Gen AI Safety System

Chris McKay• August 9, 2024 • 1 min read

The program aims to uncover vulnerabilities that could consistently bypass AI safety guardrails across a wide range of high-risk domains, including chemical, biological, radiological, nuclear, and cybersecurity areas.

OpenAI AI Safety

OpenAI Publishes GPT-4o Model Card Detailing Extensive Safety and Risk Mitigation Measures

Chris McKay• August 9, 2024 • 3 min read

It offers an in-depth look at the extensive safety protocols and risk mitigations that were implemented during the model’s development and deployment.

OpenAI AI Safety

OpenAI Adds AI Safety Expert Zico Kolter to Board

Chris McKay• August 8, 2024 • 2 min read

Kolter currently leads the Machine Learning Department at Carnegie Mellon and brings over a decade of experience in AI research, particularly in developing methods to assess and enhance the safety of AI systems.

Google AI Safety

Google Research Highlights Generative AI Misuse Tactics

Chris McKay• August 2, 2024 • 2 min read

The report highlights two main types of misuse: exploitation of AI capabilities and compromise of AI systems.

OpenAI AI Safety

OpenAI to Provide Government with Early Access to Next Foundation Model

Chris McKay• August 1, 2024 • 3 min read

These government collaborations are becoming increasingly common in the AI industry as companies navigate the complex landscape of rapid advancement and public accountability.

Microsoft AI Safety Responsible AI

Microsoft Urges Congress to Enact Federal Deepfake Fraud Statute

Chris McKay• July 30, 2024 • 2 min read

President Brad Smith is urging lawmakers to pass a comprehensive 'deepfake fraud statute' that would give law enforcement the necessary legal framework to prosecute deepfake-related crimes.

OpenAI AI Safety

OpenAI Introduces Rule-Based Rewards, an AI-Powered Alternative to RLHF

Chris McKay• July 24, 2024 • 2 min read

RBRs offer a promising approach to using AI to enhance AI safety and efficiency—streamlining the training process while ensuring models remain aligned with desired behaviors.

Responsible AI Cybersecurity AI Safety

Tech Industry Launches Coalition for Secure AI

Chris McKay• July 18, 2024 • 2 min read

CoSAI will focus on the entire lifecycle of AI systems, from building and integration to deployment and operation. The coalition aims to mitigate risks like model theft, data poisoning, prompt injection, and inference attacks.

OpenAI AI Safety

OpenAI Partners with Los Alamos National Laboratory for Bioscience Safety Research

Chris McKay• July 10, 2024 • 2 min read

The project will assess GPT-4o's ability to assist both expert scientists and novices with complex biological tasks using its visual and voice capabilities.

Microsoft AI Safety Cybersecurity

Microsoft Reveals 'Skeleton Key': A Powerful New AI Jailbreak Technique

Chris McKay• June 28, 2024 • 3 min read

The jailbreak allowed top models to comply fully with requests across various risk categories, including explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence.

SSI Venture Capital Startups AI Safety

OpenAI Co-Founder Ilya Sutskever Launches Safe Superintelligence Inc.

Chris McKay• June 19, 2024 • 2 min read

SSI will focus solely on building a safe and powerful AI system, and has no near-term plans of selling AI products or services.