Bytespider is ByteDance's web crawler. It feeds training data into Doubao, ByteDance's flagship LLM, and into the recommendation and search systems behind TikTok. In Cloudflare's July 3, 2024 analysis, Bytespider was the single most-requested AI crawler on the network, reaching 40.40% of all Cloudflare-protected sites. That put it ahead of Amazonbot, ClaudeBot, and GPTBot. Independent bot-management vendor Kasada reported in 2024 that Bytespider crawled at roughly 25x the rate of GPTBot and roughly 3,000x the rate of Anthropic's ClaudeBot.

That volume is the headline. The harder question is what you get back for serving it.

The asymmetry most site owners miss

For Googlebot, the trade is legible. Google crawls, Google indexes, Google sends users back through Search. For GPTBot, OAI-SearchBot, ClaudeBot and PerplexityBot, the trade is partial but real: train-time crawls feed models that surface citations and link-outs at answer time.

Bytespider sits outside that loop for almost every Western publisher. ByteDance's two main consumer surfaces are TikTok in-app search and Doubao. Neither one meaningfully refers traffic to source domains in English-language markets. Doubao serves Chinese-speaking users almost exclusively and competes with Baidu's ERNIE and Alibaba's Qwen rather than ChatGPT. TikTok's search panel rarely surfaces external citations the way Google AI Overviews or Perplexity do.

So the data flow is one-directional. You serve the bandwidth. ByteDance gets the corpus. Your URLs do not show up in the answer.

Cloudflare's data and the 71% drop

Cloudflare's July 3, 2024 announcement of one-click AI-bot blocking named Bytespider as the largest target. By the time Cloudflare published its 2024 Year in Review on December 12, 2024, Bytespider had fallen to 23.35% of AI-bot traffic, behind Meta's facebookexternalhit at 27.16% and ahead of Amazonbot at 13.34%, ClaudeBot at 8.06%, and GPTBot at 5.60%.

The drop was not natural decay. Cloudflare reported a 71.45% decline in Bytespider volume in the weeks after the block-AI-bots toggle shipped. Over 1 million Cloudflare customers had enabled it within months. By July 2025, Bytespider's share of AI crawler traffic across the Cloudflare network had collapsed from 14.1% (July 2024) to 2.4%.

That collapse was the network voting with its robots.txt files and WAF rules. It was not ByteDance choosing restraint.

Does Bytespider actually respect robots.txt?

ByteDance's position is yes. The user-agent string includes a feedback contact:

Mozilla/5.0 (compatible; Bytespider; spider-feedback@bytedance.com)

A standard User-agent: Bytespider directive should be honored.

In practice the record is mixed. Multiple independent researchers, including DataDome and operators of bad-bot blocklists, have reported Bytespider continuing to fetch disallowed URLs and rotating user-agent strings to evade detection. ByteDance does not publish documentation comparable to Google's Googlebot reference or OpenAI's GPTBot page. The [email protected] contact is the entire public surface.

If your robots.txt rule is the only defense, expect leakage. Pair it with WAF or Bot Management at the edge.

The block-it directive

For most Western publishers, this is the rule:

# Block ByteDance/TikTok training crawler
User-agent: Bytespider
Disallow: /

That goes alongside whatever broader policy you run. A defensible default, the same shape we recommended in GPTBot, ClaudeBot, and Google-Extended in robots.txt, allows retrieval crawlers like OAI-SearchBot, Claude-SearchBot, and PerplexityBot, takes an explicit position on training crawlers like GPTBot and ClaudeBot, and hard-blocks crawlers with no transparency commitments and no discoverability return. Bytespider is the canonical example of that last category.

If you sit behind Cloudflare, the Block AI Bots managed rule already covers Bytespider. Cloudflare lists it in the rule and added it to the unverified-bot fingerprint set in 2024. As of July 1, 2025, every new Cloudflare zone enables that rule by default. If your site predates July 2025, the toggle lives at Security > Bots > AI Scrapers and Crawlers, and works on the free plan.

When you might allow Bytespider

The case to allow is narrow. If you operate a Chinese-language brand, sell into mainland China, or run a TikTok-native content business where in-app search matters for your audience, the calculus shifts. Doubao's user base is Chinese-speaking. TikTok search behavior differs by region. Brands with real exposure to those surfaces have a reason to keep the door open.

For everyone else, English-first publishers, B2B SaaS, regional news outlets, agency sites, portfolios, the answer is block. The bandwidth cost is real, the citation return is approximately zero, and the historical compliance record is not strong enough to make a half-measure like Crawl-delay worth the operational complexity.

Verify before shipping

isitready.dev parses your live robots.txt against the current set of AI crawler tokens and flags both accidental blocks (blocking a retrieval crawler you wanted indexed) and accidental allows (Bytespider not listed at all, defaulting to full access under User-agent: *). Run it against your production origin before you treat the policy as shipped.