AI Safety - Maginative (Page 3)

Anthropic Research AI Safety

Deceptive "Sleeper Agent" AIs Can Slip Past Sophisticated Safety Training

Chris McKay• January 12, 2024 • 3 min read

The ability of LLMs to retain deceptive behaviors despite safety measures isn't just a technical loophole; it’s a paradigm shift in how we perceive AI reliability and integrity.

AI Safety

McAfee Unveils AI to Detect Sophisticated Deepfake Audio

Chris McKay• January 8, 2024 • 2 min read

The company has positioned the solution as the next evolution in using AI to protect online privacy and identity.

AI Safety Responsible AI

If by Claude AI

Chris McKay• December 30, 2023 • 2 min read

Modeled after Rudyard Kipling’s iconic work “If—”, the poem explores themes AI themes around prudent and inclusive governance, equitable access, and safety and ethical foresight.

AI Safety AI Tech

Microsoft Talks Safeguarding Privacy in the Age of Generative AI

Chris McKay• December 20, 2023 • 2 min read

Now more than ever, it's important to keep a close eye on how foundation model providers like Microsoft are fortifying data security, enhancing transparency, expanding user controls, and aligning with evolving regulations worldwide.

OpenAI AI Safety

OpenAI Outlines “Preparedness Framework” to Systematically Track and Mitigate AI Safety Risks

Chris McKay• December 18, 2023 • 2 min read

The Preparedness Framework, still in its beta phase, represents a concerted effort by OpenAI to not only track and evaluate but also forecast and protect against potential catastrophic risks.

Meta AI Safety

Meta Launches Purple Llama to Promote Responsible and Equitable Generative AI

Chris McKay• December 7, 2023 • 2 min read

The company seeks to seeks to promote open collaboration to address challenges around cybersecurity, content filtering, and mitigating potential harms that are top-of-mind across the industry.

Hugging Face AI Safety

Hugging Face API Token Exposure is a Wake-Up Call for AI Security

Chris McKay• December 4, 2023 • 3 min read

Researchers at cybersecurity firm Lasso Security have discovered over 1500 exposed API tokens on Hugging Face and GitHub repositories, granting access to 723 companies' accounts.

Meta AI Safety Responsible AI

Meta Disbands Responsible AI Team

Chris McKay• November 27, 2023 • 1 min read

The restructuring will see most team members shifted to Meta’s burgeoning generative AI team, while others will join the AI infrastructure unit.

Scale AI Safety

Scale AI Launches New Safety Lab to Advance AI Evaluations

Chris McKay• November 8, 2023 • 1 min read

Scale's goal with SEAL is to collaborate with government and industry stakeholders to develop robust benchmarking methodologies and products for evaluating potential risks with LLMs.

AI Policy AI Safety

White House Issues Sweeping Order to Ensure Responsible AI Development

Chris McKay• October 30, 2023 • 2 min read

This landmark directive aims not only to tap into the transformative potential of AI but also to ensure its responsible and secure development.

AI Safety

MLCommons Forms Working Group to Develop AI Safety Benchmarks

Chris McKay• October 26, 2023 • 3 min read

The goal is to support the definition of benchmarks that draw from this test pool to produce overall safety scores for systems, similar to rankings like automotive safety ratings.

Google AI Safety

Google Boosts AI Security with Expanded Bug Bounties and Open-Source Protections for AI Supply Chains

Chris McKay• October 26, 2023 • 2 min read

Google will reward qualifying bug submissions that help uncover unfair bias, model manipulation, hallucinations, and other risks endemic to AI systems.

OpenAI AI Safety

OpenAI Launches Preparedness Challenge to Accelerate Understanding of Frontier AI Risks

Chris McKay• October 26, 2023 • 1 min read

The Preparedness Challenge invites participants to consider how OpenAI’s state-of-the-art natural language, speech, and image generation models could hypothetically be misused by malicious actors.

AI Safety

Frontier Model Forum Announces New Executive Director and $10 Million AI Safety Fund

Chris McKay• October 25, 2023 • 1 min read

The Frontier Model Forum was established in July by Anthropic, Google, Microsoft and OpenAI to promote the responsible development of large language models and other advanced AI systems referred to as "frontier models."

AI Safety

Partnership on AI Releases Initial Guidance for Safe AI Model Deployment

Chris McKay• October 24, 2023 • 2 min read

PAI's guidance aims to translate shared principles into practical actions companies can take to foster responsible model development. It offers a framework for collective research and standards around model safety as capabilities evolve.

Google AI Ethics AI Safety

Google DeepMind Proposes Framework for Social and Ethical AI Risk Assessment

Chris McKay• October 20, 2023 • 3 min read

As generative AI systems gain ground, the urgency to evaluate their social and ethical risks escalates. DeepMind introduces a holistic framework, emphasizing the significance of context in AI safety, marking a step forward in responsible AI evolution.

Microsoft AI Safety

Microsoft Launches AI Bug Bounty Program for Bing Chat

Chris McKay• October 16, 2023 • 2 min read

Researchers who provide qualified submissions demonstrating concrete security issues will receive bounties ranging from $2,000 to $15,000.

Google AI Safety Startups

Google Announces AI for Cybersecurity Growth Academy

Chris McKay• October 16, 2023 • 2 min read

The program aims to help startups applying AI to cybersecurity challenges to boost their customer base, generate more revenue, and expand internationally.

China AI Safety AI Policy

China Seeks Stricter Oversight of Generative AI with Proposed Data and Model Regulations

Chris McKay• October 14, 2023 • 3 min read

By emphasizing corpus safety, model security, and rigorous assessment, the regulation intends to ensure that the rise of AI in China is both innovative and secure—all while upholding its socialist principles.

AI Safety AI Ethics

TikTok Introduces Labeling Tool for AI-Generated Content

Chris McKay• September 19, 2023 • 1 min read

The new initiatives empower TikTok creators to showcase AI as part of their creative process. At the same time, the disclosures give viewers crucial context about the content they consume.

OpenAI AI Safety

OpenAI Invites Domain Experts to Join New Red Teaming Network for AI Safety

Chris McKay• September 19, 2023 • 2 min read

The goal is to identify potential risks and improve the safety of systems like ChatGPT and DALL-E before release.

Venture Capital Startups AI Safety

HiddenLayer Raises $50M in Series A to Safeguard AI Models

Chris McKay• September 19, 2023 • 2 min read

The round marks the largest early stage raise this year for an AI security company, was led by Microsoft’s venture fund M12 and Moore Strategic Ventures.

AI Safety AI Policy Geopolitics

Eight More AI Companies Commit to Voluntary National AI Safety Efforts

Chris McKay• September 12, 2023 • 1 min read

The latest companies joining the effort are Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI and Stability AI.

AI Tech AI Safety

Arthur Launches Bench: An Open-Source Tool for Evaluating Large Language Models

Chris McKay• August 17, 2023 • 2 min read

Companies can evaluate criteria such as accuracy, fairness, content quality, and more across different LLMs using standardized prompts designed for business applications.

OpenAI AI Safety

OpenAI Proposes Using GPT-4 for Content Moderation

Chris McKay• August 15, 2023 • 3 min read

The company says a content moderation system using GPT-4 results in much faster iteration on policy changes, reducing the cycle from months to hours.