AI Safety

OpenAI AI Safety

OpenAI Releases Report on Disrupting AI-Powered Influence Operations

Chris McKay• May 30, 2024 • 2 min read

The report provides valuable insights into the evolving landscape of deceptive influence operations and the measures needed to address them.

OpenAI AI Safety

OpenAI Forms Safety and Security Committee

Chris McKay• May 28, 2024 • 2 min read

This move comes amid scrutiny and high-profile concerns about the company's commitment to AI safety, especially following the recent departures of key personnel involved in AI safety and "superalignment" work.

Google AI Safety

Google Introduces Frontier Safety Framework to Identify and Mitigate Future AI Risks

Chris McKay• May 17, 2024 • 2 min read

The Frontier Safety Framework focuses on severe risks posed by advanced AI models, such as those with exceptional autonomy or sophisticated cyber capabilities.

OpenAI AI Safety

OpenAI Introduces Tools to Help Identify AI-Generated Content

Chris McKay• May 7, 2024 • 3 min read

Deepfakes, or AI-generated fake content, have already impacted political campaigning and voting in various countries, including Slovakia, Taiwan, and India.

AI Safety

NIST Launches GenAI Program to Evaluate and Measure Generative AI Technologies

Richard Banfield• April 29, 2024 • 1 min read

NIST GenAI will release benchmarks and evaluate generative AI technologies through a series of challenge problems. One of its key goals is to develop systems that can distinguish between human-created and AI-generated content, starting with text.

AI Safety

DHS Establishes AI Safety and Security Board with Industry Leaders

Chris McKay• April 26, 2024 • 2 min read

The Board will develop recommendations to help critical infrastructure stakeholders, such as transportation service providers, pipeline and power grid operators, and internet service providers, more responsibly leverage AI technologies.

AI Literacy AI Ethics AI Safety

AI as a "New Digital Species": Thoughts on Microsoft AI Chief's TED Talk

Chris McKay• April 23, 2024 • 4 min read

By proposing the "digital species" metaphor, he offers a new lens through which to view AI's potential and pitfalls. It invites us to consider the long-term implications of this technology and our responsibility in shaping its evolution.

Meta AI Safety

Meta to Require "Made with AI" Labels for AI Generated Content

Chris McKay• April 5, 2024 • 2 min read

The new labels will cover a broader range of content than the company's current manipulated media policy

Anthropic AI Safety Research

Anthropic Shares Research on Technique to Exploit Long Context Windows to Jailbreak Large Language Models

Chris McKay• April 2, 2024 • 3 min read

Many-shot jailbreaking works by prompting the model with a large number of fictitious question-answer pairs that depict the AI assistant providing harmful or dangerous responses.

Geopolitics AI Safety

U.S. and U.K. Announce Partnership on AI Safety Testing

Chris McKay• April 1, 2024 • 2 min read

The partnership comes at a crucial time as AI continues to develop rapidly, and both governments recognize the need for a shared approach to AI safety that can keep pace with the technology's emerging risks.

AI Safety

Chinese Software Engineer Arrested for Alleged Theft of AI Trade Secrets from Google

Richard Banfield• March 6, 2024 • 2 min read

The indictment describes how, unbeknownst to Google, Ding had affiliated himself with two Chinese technology companies, Rongshu and Zhisuan, while employed at Google.

Google Cybersecurity AI Safety

Google Launches AI Cyber Defense Initiative to Help Defenders Gain the Upper Hand

Chris McKay• February 16, 2024 • 2 min read

Google believes AI can address systemic issues that contribute to the imbalance favoring attackers. It provides reasoning capabilities to tackle complexity, learning to evolve defenses, speed unmatched by humans, and ability to process security data at enormous scale.

Anthropic AI Safety

How Anthropic Is Safeguarding Users Against Election Misinformation

Chris McKay• February 16, 2024 • 2 min read

By combining policy, technical safeguards, and transparent communication, Anthropic aims to responsibly navigate AI through a high-stakes election year.

AI Safety AI Ethics

Leading Tech Companies Pledge to Protect 2024 Elections from Deceptive AI

Richard Banfield• February 16, 2024 • 2 min read

As part of the accord, companies agreed to eight commitments including assessing AI systems that could enable election deception campaigns, seeking to detect deepfakes on their platforms, providing transparency around policies, and supporting public awareness efforts.

OpenAI Microsoft AI Safety Cybersecurity

State-Affiliated Hackers from China, Russia, Iran and North Korea Used ChatGPT to Boost Cyberattacks

Chris McKay• February 15, 2024 • 2 min read

While concerning, Microsoft and OpenAI said the observed activities suggest AI is currently providing "only limited, incremental capabilities" for advanced hacking beyond existing methods.

Venture Capital Startups AI Safety

Bugcrowd Lands $102M to Expand Crowdsourced Security Platform

Richard Banfield• February 12, 2024 • 2 min read

The new capital will support continued enhancements of Bugcrowd's software-as-a-service platform, investments in sales, marketing and partnerships, as well as potential M&A opportunities to expand its capabilities.

Healthcare AI Ethics AI Safety

CMS Bans Medicare Advantage Insurers From Using AI to Deny Care

Chris McKay• February 8, 2024 • 2 min read

CMS worries that AI could perpetuate discrimination or fail to account for unique patient needs.

AI Tech AI Safety

F.C.C. Bans AI-Generated Robocalls

Richard Banfield• February 8, 2024 • 1 min read

The decision aims to curb an emerging tactic used by fraudsters and scammers who leverage machine learning to mimic voices and personas.

AI Policy AI Safety

Top US Companies Join AI Safety Institute Consortium

Richard Banfield• February 8, 2024 • 2 min read

The AISIC represents a crucial step in implementing President Biden's executive order on AI, which directed federal agencies to address emerging issues like AI security, red team testing, deepfakes detection, and algorithmic discrimination.

Google AI Safety AI Tech

Google Joins C2PA Steering Committee to Increase Transparency of Online Content

Richard Banfield• February 8, 2024 • 2 min read

The push for transparency comes as AI-generated media grows more advanced and increasingly difficult to distinguish from authentic content.

Meta AI Safety Responsible AI

Meta Says It Will Label AI Content Across Platforms

Richard Banfield• February 6, 2024 • 2 min read

The company already creates labels for content created with Meta AI, however, the aim is to extend this to content created using other tools as well.

AI Safety AI Tech

Bumble Takes On Scammers With New A.I. Safety Feature

Richard Banfield• February 5, 2024 • 1 min read

Within two months of early testing, Bumble saw user-reported cases of spam, scams and fakes drop by 45% across its platforms.

AI Safety China

Beware: Multi-National Company Loses HK$200 Million in Elaborate Deepfake Scam

Chris McKay• February 4, 2024 • 2 min read

Posing as headquarters executives, they convinced the Hong Kong employees to make a series of large money transfers, claiming it was for a confidential corporate transaction.

OpenAI AI Safety AI Literacy

OpenAI and Common Sense Partner to Promote Safe, Responsible AI Use Among Kids

Richard Banfield• January 29, 2024 • 1 min read

This collaboration aims to harness the potential of AI for the benefit of teens and families while addressing the challenges it presents.

Anthropic Research AI Safety

Deceptive "Sleeper Agent" AIs Can Slip Past Sophisticated Safety Training

Chris McKay• January 12, 2024 • 3 min read

The ability of LLMs to retain deceptive behaviors despite safety measures isn't just a technical loophole; it’s a paradigm shift in how we perceive AI reliability and integrity.

An Exclusive Leadership Retreat

Leading in the Intelligence Age

OpenAI Releases Report on Disrupting AI-Powered Influence Operations

OpenAI Forms Safety and Security Committee

Google Introduces Frontier Safety Framework to Identify and Mitigate Future AI Risks

OpenAI Introduces Tools to Help Identify AI-Generated Content

NIST Launches GenAI Program to Evaluate and Measure Generative AI Technologies

DHS Establishes AI Safety and Security Board with Industry Leaders

AI as a "New Digital Species": Thoughts on Microsoft AI Chief's TED Talk

Meta to Require "Made with AI" Labels for AI Generated Content

Anthropic Shares Research on Technique to Exploit Long Context Windows to Jailbreak Large Language Models

U.S. and U.K. Announce Partnership on AI Safety Testing

Chinese Software Engineer Arrested for Alleged Theft of AI Trade Secrets from Google

Google Launches AI Cyber Defense Initiative to Help Defenders Gain the Upper Hand

How Anthropic Is Safeguarding Users Against Election Misinformation

Leading Tech Companies Pledge to Protect 2024 Elections from Deceptive AI

State-Affiliated Hackers from China, Russia, Iran and North Korea Used ChatGPT to Boost Cyberattacks

Bugcrowd Lands $102M to Expand Crowdsourced Security Platform

CMS Bans Medicare Advantage Insurers From Using AI to Deny Care

F.C.C. Bans AI-Generated Robocalls

Top US Companies Join AI Safety Institute Consortium

Google Joins C2PA Steering Committee to Increase Transparency of Online Content

Meta Says It Will Label AI Content Across Platforms

Bumble Takes On Scammers With New A.I. Safety Feature

Beware: Multi-National Company Loses HK$200 Million in Elaborate Deepfake Scam

OpenAI and Common Sense Partner to Promote Safe, Responsible AI Use Among Kids

Deceptive "Sleeper Agent" AIs Can Slip Past Sophisticated Safety Training