Google has introduced a new user agent token called Google-Extended that allows web publishers to opt out of having their website content used to train the company's AI systems, including the Bard conversational AI and Vertex AI APIs.
The move comes as AI generators like ChatGPT have raised concerns among publishers about copyright and compensation for using their content. Over the last month, Microsoft and OpenAI have also unveiled publisher opt-out schemes for Bing Chat and GPT.
With Google-Extended, publishers can add a directive to their robots.txt file that blocks Google's crawlers from accessing their pages to gather data. This prevents the site's content from being used to improve the accuracy and capabilities of Google's generative AI tools. Here is a simple example of what a robots.txt
file could look like:
#Block all Google AI agents
User-agent: Google-Extended
Disallow: /
In a blog post, Google stated that providing transparency and control is an "important step" that all AI model providers should take. The company acknowledged that as AI use cases expand, publishers need more granular ways to manage different types of usage.
Google said it is committed to working with the web community on additional machine-readable approaches for publishers to control their content usage. The company is likely feeling pressure as OpenAI's ChatGPT has popularized AI text generation, causing publishers to speak up about copyright concerns.
While Google-Extended
offers an opt-out for training data collection, the company's blog post did not address other publisher concerns around AI like copyright infringement or compensation. As AI text generation becomes more ubiquitous, expect ongoing debate around publisher rights and the laws surrounding new generative technologies.