Most websites were configured for 10–15 crawlers. In 2026, there are 35+. The gap between "robots.txt written in 2023" and "robots.txt correct in 2026" is the difference between being visible to AI search engines and being structurally blocked from them — often without knowing it.
The critical insight for 2026 robots.txt configuration is that every major AI company separates its crawlers by function. You can't configure "OpenAI" or "Anthropic" — you have to configure each bot individually by its exact User-agent string.
The three types:
Common mistake: Many sites block GPTBot (training) but never added OAI-SearchBot or ChatGPT-User (search). Result: they're accidentally blocking ChatGPT Search citations while correctly blocking training crawling.
| User-agent string | Company | Type | Recommended |
|---|---|---|---|
| GPTBot | OpenAI | Training | Optional |
| OAI-SearchBot | OpenAI | Search index | Allow |
| ChatGPT-User | OpenAI | Real-time browsing | Allow |
| ClaudeBot | Anthropic | Training | Optional |
| Claude-SearchBot | Anthropic | Search index | Allow |
| Claude-User | Anthropic | Real-time browsing | Allow |
| anthropic-ai | Anthropic | Legacy string | Allow |
| Claude-Web | Anthropic | Legacy string | Allow |
| Googlebot | Search index | Allow | |
| Gemini-Deep-Research | Deep research agent | Allow | |
| Google-NotebookLM | NotebookLM agent | Allow | |
| Bingbot | Microsoft | Search index (ChatGPT uses Bing) | Allow |
| PerplexityBot | Perplexity | Search index | Allow |
| Perplexity-User | Perplexity | Real-time browsing | Allow |
| xAI-Bot | xAI (Grok) | Index/training | Allow |
| GrokBot | xAI (Grok) | Real-time browsing | Allow |
| meta-externalagent | Meta | Training/index | Optional |
| Meta-ExternalAgent | Meta | Agent (variant string) | Optional |
| DuckAssistBot | DuckDuckGo | AI assistant index | Allow |
| BraveBot | Brave | Search index (Claude uses Brave) | Allow |
| MistralAI-User | Mistral | Real-time browsing | Allow |
| YouBot | You.com | Index crawler | Allow |
| TavilyBot | Tavily | AI search API | Allow |
| PhindBot | Phind | Developer AI search | Allow |
| Applebot | Apple | Apple Intelligence index | Allow |
| Applebot-Extended | Apple | AI training data | Optional |
| CCBot | Common Crawl | Training data only | Block |
| PanguBot | Huawei | Training only | Block |
| ChatGLM-Spider | Zhipu AI | Training only | Block |
| img2dataset | Various | Training data scraper | Block |
# GEO-optimized robots.txt — May 2026
# Allow all major AI search and browsing bots
User-agent: *
Allow: /
# OpenAI — three-bot framework
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
# Anthropic — three-bot framework
User-agent: ClaudeBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: Claude-Web
Allow: /
# Perplexity — two-bot framework
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
# Google AI products
User-agent: Gemini-Deep-Research
Allow: /
User-agent: Google-NotebookLM
Allow: /
# xAI / Grok
User-agent: xAI-Bot
Allow: /
User-agent: GrokBot
Allow: /
# Meta AI
User-agent: meta-externalagent
Allow: /
User-agent: Meta-ExternalAgent
Allow: /
# Brave Search (used by Claude)
User-agent: BraveBot
Allow: /
# Bing (used by ChatGPT)
User-agent: Bingbot
Allow: /
# Other AI search engines
User-agent: DuckAssistBot
Allow: /
User-agent: MistralAI-User
Allow: /
User-agent: YouBot
Allow: /
User-agent: TavilyBot
Allow: /
User-agent: PhindBot
Allow: /
User-agent: Applebot
Allow: /
# Training-only scrapers — block
User-agent: CCBot
Disallow: /
User-agent: PanguBot
Disallow: /
User-agent: ChatGLM-Spider
Disallow: /
User-agent: img2dataset
Disallow: /
Sitemap: https://yourdomain.com/sitemap.xml
Neurobird checks all 35+ AI crawlers against your robots.txt and tells you exactly which search bots you're accidentally blocking.
Check your robots.txt free →