Tracking OpenAI – ChatGPT Bots – A Fresh Guide for Webmasters, Site Owners, and SEO’s


Last updated 3/1/2026
robots.txt controls
OpenAI user agents
Search vs retrieval vs training
What this post covers
OpenAI runs multiple crawlers and user agents, and they do different jobs. Treating them as one single “AI bot” causes bad decisions. Below is the practical breakdown for webmasters, site owners, and SEO teams, plus copy/paste robots.txt patterns.

OpenAI uses web crawlers (“robots”) and user agents to perform actions for its products, either automatically or triggered by a user request.
OpenAI exposes separate robots.txt controls so you can manage each behavior independently.

That means you can allow AI search inclusion while blocking model training, or allow user initiated page fetching while blocking automated crawling.
Each setting is independent, so you can mix and match based on your goals.

Key point
Each setting is independent. You can allow OAI-SearchBot and still disallow GPTBot. You can allow ChatGPT-User for user-triggered retrieval without opting into training.
For search results, it can take about 24 hours after a robots.txt update for systems to adjust.

OAI-SearchBot

Purpose: AI search inclusion (SearchGPT prototype), not training

OAI-SearchBot is OpenAI’s search crawler. It is used to link to and surface websites inside AI-driven search results.
It is not used to crawl content to train OpenAI’s generative AI foundation models.

  • If you want your site to appear in OpenAI search results, allow OAI-SearchBot in robots.txt.
  • For tighter control, you can also allowlist requests from the published IP ranges.
  • Robots.txt changes can take about 24 hours to reflect in search behavior.

Full user-agent string

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

Published IP addresses
https://openai.com/searchbot.json

ChatGPT-User

Purpose: user-triggered fetching in ChatGPT and Custom GPTs, not training

ChatGPT-User is used for user actions in ChatGPT and Custom GPTs. When someone asks a question, ChatGPT may fetch a live web page to help answer and to provide a source link.
This agent can also be used when users interact with external applications through GPT Actions.

  • This is not automated crawling, it is user initiated retrieval.
  • Blocking it prevents ChatGPT from fetching your pages during user requests.
  • It is not used for training generative AI foundation models.

Full user-agent string

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

Published IP addresses
https://openai.com/chatgpt-user.json

GPTBot

Purpose: training crawler for foundation models

GPTBot is used to make OpenAI’s generative AI foundation models more useful and safe. It crawls content that may be used in training.
Disallowing GPTBot indicates your site’s content should not be used in training generative AI foundation models.

  • Blocking GPTBot is a training control decision.
  • It does not automatically block search inclusion (OAI-SearchBot) or user-triggered retrieval (ChatGPT-User).

Full user-agent string

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

Published IP addresses
https://openai.com/gptbot.json

Practical robots.txt scenarios

Allow search, block training
Eligible for OpenAI search results, not used for foundation model training.

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /

Allow user retrieval, block training and search crawling
ChatGPT can fetch pages when a user asks, no automated search crawling, no training.

User-agent: ChatGPT-User
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

Full opt-out
No training, no OpenAI search inclusion, no user-triggered fetching.

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Ops note
If you validate bots by IP, match request IPs against the published JSON lists above, and keep your allowlist updated.
Logging by user-agent plus IP is still the cleanest way to audit this in production.

What to monitor

  • Server logs for these user agents (frequency, status codes, paths hit).
  • IP validation against OpenAI’s published JSON ranges.
  • Robots.txt change timing vs observed crawl behavior (expect up to about a day for search adjustments).
  • Citation and link traffic, note that referral headers are not always reliable for AI surfaces.

References

Change Log

  • 3/1/2026 Updated list to include new IP addresses