Technical

The Complete robots.txt Guide for AI Bots in 2026

Neurobird Research Team · May 2026 · 5 min read
Table of 35+ AI crawler User-agent strings in 2026, categorized by company and purpose
The complete 2026 AI crawler landscape: 35+ User-agent strings across OpenAI, Anthropic, Google, Meta, xAI, Perplexity, and more

Most websites were configured for 10–15 crawlers. In 2026, there are 35+. The gap between "robots.txt written in 2023" and "robots.txt correct in 2026" is the difference between being visible to AI search engines and being structurally blocked from them — often without knowing it.

35+
Active AI crawler User-agent strings as of May 2026
~50%
Of AI search crawlers blocked by average 2023-era robots.txt
3
Bots per major AI company — most sites only configure 1

The three-bot framework — every major AI company uses it

The critical insight for 2026 robots.txt configuration is that every major AI company separates its crawlers by function. You can't configure "OpenAI" or "Anthropic" — you have to configure each bot individually by its exact User-agent string.

The three types:

Common mistake: Many sites block GPTBot (training) but never added OAI-SearchBot or ChatGPT-User (search). Result: they're accidentally blocking ChatGPT Search citations while correctly blocking training crawling.

Complete AI bot reference table — May 2026

User-agent stringCompanyTypeRecommended
GPTBotOpenAITrainingOptional
OAI-SearchBotOpenAISearch indexAllow
ChatGPT-UserOpenAIReal-time browsingAllow
ClaudeBotAnthropicTrainingOptional
Claude-SearchBotAnthropicSearch indexAllow
Claude-UserAnthropicReal-time browsingAllow
anthropic-aiAnthropicLegacy stringAllow
Claude-WebAnthropicLegacy stringAllow
GooglebotGoogleSearch indexAllow
Gemini-Deep-ResearchGoogleDeep research agentAllow
Google-NotebookLMGoogleNotebookLM agentAllow
BingbotMicrosoftSearch index (ChatGPT uses Bing)Allow
PerplexityBotPerplexitySearch indexAllow
Perplexity-UserPerplexityReal-time browsingAllow
xAI-BotxAI (Grok)Index/trainingAllow
GrokBotxAI (Grok)Real-time browsingAllow
meta-externalagentMetaTraining/indexOptional
Meta-ExternalAgentMetaAgent (variant string)Optional
DuckAssistBotDuckDuckGoAI assistant indexAllow
BraveBotBraveSearch index (Claude uses Brave)Allow
MistralAI-UserMistralReal-time browsingAllow
YouBotYou.comIndex crawlerAllow
TavilyBotTavilyAI search APIAllow
PhindBotPhindDeveloper AI searchAllow
ApplebotAppleApple Intelligence indexAllow
Applebot-ExtendedAppleAI training dataOptional
CCBotCommon CrawlTraining data onlyBlock
PanguBotHuaweiTraining onlyBlock
ChatGLM-SpiderZhipu AITraining onlyBlock
img2datasetVariousTraining data scraperBlock

The correct robots.txt template for 2026

# GEO-optimized robots.txt — May 2026
# Allow all major AI search and browsing bots

User-agent: *
Allow: /

# OpenAI — three-bot framework
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /

# Anthropic — three-bot framework
User-agent: ClaudeBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: Claude-Web
Allow: /

# Perplexity — two-bot framework
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /

# Google AI products
User-agent: Gemini-Deep-Research
Allow: /
User-agent: Google-NotebookLM
Allow: /

# xAI / Grok
User-agent: xAI-Bot
Allow: /
User-agent: GrokBot
Allow: /

# Meta AI
User-agent: meta-externalagent
Allow: /
User-agent: Meta-ExternalAgent
Allow: /

# Brave Search (used by Claude)
User-agent: BraveBot
Allow: /

# Bing (used by ChatGPT)
User-agent: Bingbot
Allow: /

# Other AI search engines
User-agent: DuckAssistBot
Allow: /
User-agent: MistralAI-User
Allow: /
User-agent: YouBot
Allow: /
User-agent: TavilyBot
Allow: /
User-agent: PhindBot
Allow: /
User-agent: Applebot
Allow: /

# Training-only scrapers — block
User-agent: CCBot
Disallow: /
User-agent: PanguBot
Disallow: /
User-agent: ChatGLM-Spider
Disallow: /
User-agent: img2dataset
Disallow: /

Sitemap: https://yourdomain.com/sitemap.xml

Is your robots.txt blocking AI search bots?

Neurobird checks all 35+ AI crawlers against your robots.txt and tells you exactly which search bots you're accidentally blocking.

Check your robots.txt free →
Watch — video explainer
How to Configure robots.txt for AI Crawlers
Independent tutorial on allowing and blocking AI bots via robots.txt

Frequently Asked Questions

How many AI crawlers are actively indexing the web in 2026?
As of May 2026, there are 35+ distinct AI crawler User-agent strings actively indexing the web. This includes training crawlers, search index crawlers, and real-time browsing agents from OpenAI, Anthropic, Google, Meta, xAI, Perplexity, Brave, Apple, and others.
Should I block AI training crawlers in robots.txt?
That depends on your goals. Blocking training-only crawlers like CCBot and GPTBot prevents your content from being used to train AI models. However, blocking search crawlers like OAI-SearchBot or Claude-SearchBot makes your site invisible to ChatGPT and Claude search citations. The key is to separate training bots from search bots and treat them differently.
Does robots.txt actually stop AI crawlers?
Major AI companies (OpenAI, Anthropic, Google, Meta, xAI) honor robots.txt. However, some training-only scrapers do not respect robots.txt at all. For those, legal controls under copyright law may be more effective than technical controls via robots.txt.
← Back to blog