After Cloudflare posted yesterday about Perplexity using crawlers with stock agent names and not checking robots.txt, Perplexity has responded publicly. Their post, Agents or Bots? Making Sense of AI on the Open Web, offers the kind of nuanced response you'd expect from a company walking the line between growth and website friction. But it doesn't dispute the core claim. Instead, it reframes it.
The headline defense?
Perplexity says their bot Agent (the one at the center of Cloudflare’s accusation) was simply pre-fetching results for its users in real time - not running a bot that crawls the web.
They also hint that the term "crawler" doesn't really apply anymore. According to Perplexity, this is “AI browsing” and shouldn’t be compared to traditional bots like old school Google spiders. The implication is that if an AI assistant is fetching a page on behalf of a human user, it’s no different than the user opening it in Chrome.
This isn’t a new argument. We've heard similar positioning from browser extensions, proxy scrapers, and “user-triggered” bot authors for years. The only real shift is that LLMs and alleged search engines are now the ones sending the requests.
What Perplexity Said (And Didn't Say)
-
- Yes, they hit your site.
Perplexity acknowledges that their system accessed URLs while users were interacting with stock bot. So, whether they call it browsing, caching, or fetching, they did reach out to servers without identifying themselves as bots. - No apology, no rollback.
There’s no indication they plan to stop. There’s no new opt-out mechanism. There’s just an assurance that they’ve added more IPs to their public documentation and that they’ve removed “Stock-AI” from their User-Agent string. - They still don’t respect
robots.txt.
The post does not say they will honor robots.txt for agent spidering, nor do they treat it as binding. This confirms what we said yesterday: if your server publishes content on the open web, expect it to be read, cached, parsed, and regurgitated.
- Yes, they hit your site.
The Robots.txt Illusion (Again)
As we wrote in yesterday’s breakdown of Cloudflare’s original post, robots.txt has no legal standing. It was never adopted by any standards body, and its syntax was unilaterally expanded by Google over the years. So when AI companies or stealth bots ignore it, they’re not breaking the law. They’re just ignoring an honor system that fewer and fewer companies consider relevant.
Mean anything for SEOs and Site Owners?
If you run a content-rich website and care about how it’s accessed, here's where we stand:
- AI models are accessing your pages, even if you didn’t opt in.
- You can’t block them all without impacting real users.
- Perplexity isn’t pretending to follow traditional crawling rules.
What little you can do is:
- Watch your server logs.
- Block or rate-limit by IP, ASN, or suspicious patterns.
- Challenge requests that bypass normal browser behavior.
- Track how your content appears in LLM summaries.
Perplexity’s response is refreshingly honest in one way: they’re not pretending the old rules apply. They’re betting that AI-assisted browsing will blur the line between user and bot, and that line is going to get harder to enforce.
The Takeaway
Perplexity isn’t crawling your site, they say. They’re just “assisting” users with intelligent browsing… by fetching and caching your content at scale. Welcome to the Agentic era.
Whether you call it a crawler, bot, or agent, it’s still chewing on your bandwidth and quoting your content.
If you're not okay with that, you’ve got some hard decisions to make about how much of your content stays public.



