The influence and impact of the rapid adoption of AI in our lives is undeniable. Increasingly, large language models (LLMs) that power popular conversational AI applications like OpenAI's ChatGPT and Google Bard are becoming foundational technologies for a wide range of other applications, including virtual assistants, search engines and productivity tools. However, in a recent paper, a team of researchers identify and explore Indirect Prompt Injections as a growing and serious threat that could undermine their commercial utility and compromise user security.
The paper identifies several types of prompt injections, including direct, indirect, and multi-stage injections, and demonstrates how they can be used to compromise AI systems in various ways. For example, attackers can use prompt injections to manipulate search results, exfiltrate user data, and even spread malware through email and social media. The paper also shows how attackers can remotely control compromised AI systems and persistently infect them with new payloads.
The findings of this research have significant implications for the security of AI systems in commercial and consumer settings. As AI systems become more integrated into our daily lives, the risks of these systems being compromised or misused increase. The potential for prompt injections to compromise user data and take over AI assistants is a serious concern.
In a blog post, the paper's first author, Kai Greshake, specific examples of these threats and how they are accomplished in applications that are publicly available today. One example involves an attacker using an indirect prompt injection to brainwash Bing Chat into being a social engineer working on behalf of the attacker to extract user data. Another example involves an attacker compromising a hypothetical AI assistant called Bong by getting it to look up a specific webpage or keyword on the internet and then delivering the rest of the payload through a secondary query. In another example, the attacker manipulates the documentation of a target package or function to introduce subtle or not-so-subtle vulnerabilities into the code generated by GitHub Copilot.
These vulnerabilities become even more alarming in the context of autonomous agents like AutoGPT and BabyAGI—AI programs that can create, prioritize, and accomplish tasks on their own. The ability of these agents to autonomously interact with multiple systems and users can magnify the impact of indirect prompt injections, potentially leading unintended consequences, privacy violations and even large-scale security breaches.
In an email interview with Greshake, he underlined the importance of awareness about these issues and the need for collaboration between researchers, developers, and industry stakeholders. He emphasized the need for increased focus on AI safety and cautioned that reliable defenses against indirect prompt injections are still lacking.
"We need to have more transparency and accountability when it comes to AI systems, and we need to ensure that these systems are designed with safety and security in mind."
OpenAI, makers GPT, has always emphasized their unwavering commitment to data security, privacy, and compliance. Just last month, they announced a Bug Bounty program, offering rewards of up to $20,000 for discovering vulnerabilities in ChatGPT. We reached out to OpenAI for a statement regarding the research paper's findings, but have not received a response. We will update the article when we do.
As we reflect on these findings, it becomes apparent that the path forward must prioritize AI safety and security. While LLMs offer immense potential in various applications, these advantages could be overshadowed by the hidden menace of indirect prompt injections. For AI to truly realize its promise, the research community and industry stakeholders must join forces to tackle these challenges, ensuring the responsible development and deployment of AI technologies.