With the rise of AI-driven technologies, it's important to manage how these systems interact with your website. OpenAI has outlined how their GPTBot crawls websites and ingests content to refine their models. If you'd prefer to prevent these bots from accessing your site, you can easily configure your robots.txt
file to do so.
Blocking GPTBot from Accessing Your Site
If you want to completely block GPTBot from crawling your website, simply add the following lines to your robots.txt
file:
User-agent: GPTBot
Disallow: /
Customizing GPTBot’s Access
For those who want more granular control, you can choose to allow GPTBot access to specific areas of your site while blocking others. Modify your robots.txt
file as follows, replacing the directories with your desired paths:
User-agent: GPTBot
Allow: /your-allowed-directory
Disallow: /your-blocked-directory
Understanding GPTBot and Other AI Bots
OpenAI uses different user agents for web crawling and user browsing, but currently, the opt-out process treats both the same. By restricting the GPTBot
user-agent, you're effectively covering both scenarios.
Restricting Additional AI Bots
You may also wish to block other AI-powered bots from accessing your site. Below is an example of a robots.txt
file configured to disallow a variety of known AI agents:
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: cohere-ai
Disallow: /
By configuring your robots.txt
file in this way, you can take control over which AI bots can interact with your site’s content, helping to protect your data and maintain the integrity of your website.