robots.txtAI companies run their own crawlers, separate from the classic search bots. If your
robots.txt says nothing about them, your policy is accidental. Stating it
explicitly — whether you allow them (to be cited in answers) or block
them (to keep content out of training/answers) — is the AI-ready move. To be found
by ChatGPT, Claude and Perplexity, you generally want to allow them.
| User-agent | Who |
|---|---|
GPTBot, OAI-SearchBot, ChatGPT-User | OpenAI / ChatGPT |
ClaudeBot, anthropic-ai | Anthropic / Claude |
Google-Extended | Google Gemini / AI Overviews |
PerplexityBot | Perplexity |
CCBot | Common Crawl (feeds many models) |
Add this to your robots.txt at https://yourdomain.com/robots.txt:
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: CCBot
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
To block a crawler instead, replace its Allow: / with
Disallow: /. Either way, naming them is what makes the policy intentional.