PerplexityBot is the indexing crawler behind Perplexity's answer engine, which served roughly 780 million queries in May 2025 according to CEO Aravind Srinivas. The bot has a reputation problem that traces back to a June 19, 2024 Wired investigation by Dhruv Mehrotra and Tim Marchman, which alleged Perplexity was scraping sites that explicitly disallowed it in robots.txt. The story matters for one reason: if you read the headline and stop, you will block the wrong thing.

Two crawlers, one brand

Perplexity ships two declared user-agents, and the difference is the entire policy question.

PerplexityBot/1.0 is the indexing crawler. It populates Perplexity's search index, fetches pages for citation in answers, and, per Perplexity's crawler documentation, respects robots.txt directives addressed to PerplexityBot. Perplexity states this bot is not used to train foundation models. Blocking it means your URLs stop appearing as cited sources in Perplexity answers. Perplexity still surfaces your domain name and headline from third-party signals when blocked, but the linked-citation path closes.

Perplexity-User/1.0 is the user-triggered fetcher. It runs when a logged-in user types a query that requires fetching a specific URL — pasting a link, asking "summarize this page", or following a citation. Perplexity's help center documentation is explicit: Perplexity-User does not follow robots.txt, on the argument that the request originates from a human, not an automated crawl. OpenAI's ChatGPT-User and Anthropic's Claude-User carve out the same exception with similar language.

Both crawlers publish IP ranges as JSON: https://www.perplexity.com/perplexitybot.json and https://www.perplexity.com/perplexity-user.json. Use these for verification at the WAF layer if you care about identity, not just user-agent strings.

What the 2024 controversy actually proved

Wired's June 2024 reporting, building on developer Robb Knight's June 14, 2024 investigation, identified an AWS-hosted virtual machine at IP 44.221.181.252 fetching content from sites that disallowed PerplexityBot in robots.txt. The IP did not appear in Perplexity's published ranges. Srinivas told Wired the crawler was operated by an unnamed third-party provider under NDA. Amazon Web Services opened an inquiry into whether the activity violated AWS terms of service.

The pattern repeated. On August 4, 2025, Cloudflare published a detailed technical writeup showing Perplexity rotating through undeclared IPs and ASNs, sending requests with a generic Chrome 124 on macOS user-agent, after their declared crawlers were blocked by both robots.txt and WAF rules. Cloudflare measured the activity across tens of thousands of domains and millions of requests per day. Cloudflare delisted Perplexity from its verified bots program and added managed-rule heuristics that block the stealth traffic by behavior, not just user-agent. Perplexity responded by calling the writeup a "publicity stunt" and arguing the traffic was a third-party headless browser called Browserbase acting on user requests.

You can read the dispute either way. The verifiable fact: declared user-agent compliance is not the same as actual compliance, and Perplexity has been credibly accused twice in 14 months.

A robots.txt that makes your stance legible

Block both crawlers if your policy is "no Perplexity, period". Allow PerplexityBot and block Perplexity-User if you want citation visibility but reject the user-triggered loophole. The latter is honest, and Perplexity-User ignores it anyway, but the file documents your intent for any reasonable reader.

User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/

# Allow Perplexity citation indexing
User-agent: PerplexityBot
Allow: /

# Reject user-triggered fetches (advisory; not enforced by Perplexity)
User-agent: Perplexity-User
Disallow: /

If you want to reject everything, mirror the block:

User-agent: PerplexityBot
Disallow: /

User-agent: Perplexity-User
Disallow: /

robots.txt is advisory. Enforcement happens at the network edge. Cloudflare customers on the Bot Management plan get the stealth-traffic block automatically as of August 2025. Free-plan sites can use the one-click AI bot block Cloudflare shipped on July 3, 2024, which covers PerplexityBot among 24 other AI crawlers. Sites on Fastly, AWS WAF, or self-managed nginx need to write the rules themselves and update them as Perplexity's tactics shift.

The honest tradeoff

Block PerplexityBot and you exit a fast-growing answer surface. Allow it and you accept that Perplexity's compliance record is the worst of any major AI vendor — OAI-SearchBot, Claude-SearchBot, and Googlebot all have cleaner audit trails. Most public-facing marketing sites should allow PerplexityBot and use Cloudflare's managed rules (or equivalent) to handle the undeclared traffic. Most paywalled publishers should block both, file a written robots.txt policy as evidence, and assume the actual enforcement happens at the WAF.

The wrong default is "block both because of the Wired story". That blocks your citations and does nothing about the stealth crawling, because the stealth crawling never identified itself as PerplexityBot to begin with.

Verify before shipping

isitready.dev parses your robots.txt against current Perplexity user-agent tokens and flags two common bugs: rules written for Perplexitybot (lowercase) that fail to match PerplexityBot, and Disallow: / rules under User-agent: * that block the AI crawlers you forgot to declare separately. Run it before you trust a hand-edited file.

Two crawlers, one brand

What the 2024 controversy actually proved

A robots.txt that makes your stance legible

The honest tradeoff

Verify before shipping

Related guides