Feb 4, 2025

AI

4 min

The Great AI Web Crawler Slowdown

The internet has always been a battleground between humans and bots. On one side, websites offer up information, hoping for human engagement (and maybe some ad revenue). On the other, automated crawlers roam the web, indexing, analyzing, and – more recently – vacuuming up vast amounts of text to train AI models. But as AI-driven scrapers get more aggressive, website owners are pushing back. Rather than just blocking bots outright, some are turning to a more creative approach: tarpitting. Instead of telling a crawler “go away,” a tarpit invites it in and then slows it down to a… crawl, feeding it endless nonsense or painfully slow responses. If the AI model is trying to extract data, it gets stuck in a digital bog, wasting resources and ultimately achieving nothing.

The Scraper Wars

This is just the latest skirmish in the long-running conflict between webmasters and bots. Search engines like Google have long played by a set of agreed-upon rules – respecting robots.txt files, limiting request rates, and generally behaving like polite guests. But newer AI scrapers? Not so much.

As AI models grow ever more powerful, they need more and more data, and some web crawlers don’t particularly care where that data comes from or whether the site owner consents. In theory, robots.txt should control access, but in practice, not all AI scrapers follow those guidelines. Some disguise themselves, some ignore restrictions, and others make so many rapid-fire requests that they resemble a denial-of-service attack more than a polite data collection effort.

For website owners, this creates a dilemma: do nothing and let AI models siphon off their content, block crawlers outright and risk unintended consequences (like losing search engine visibility), or start playing defense with tools like tarpits?

The AI Reflective DDoS Problem

The fight against web crawlers isn’t just about data protection–it’s also about security. Some AI scrapers have a surprising vulnerability: they can be tricked into launching DDoS-style attacks.

Here’s how it works: a single request to an AI’s API – say, to summarize or analyze a webpage – can trigger thousands of automated follow-up requests from the AI’s own crawler. If someone were so inclined, they could exploit this to turn an AI model’s scraper into an unwitting attack tool, directing massive amounts of traffic at a target site.

The really interesting part? Attempts to report this issue to the relevant AI companies have reportedly gone nowhere. Researchers who discovered the problem describe a frustrating cycle of automated support emails, ignored reports, and general disinterest. For a company running a major AI model, fixing a security flaw like this should be a top priority – but so far, the response has been underwhelming.

The Ethics of Slowing Down AI

There’s a philosophical question at the heart of this fight: should the web be freely accessible for AI training, or do website owners have the right to actively fight back?

Some argue that AI scraping is an inevitability – just an extension of the way search engines have indexed the web for decades. But others (including myself) see a crucial difference: while search engines drive traffic to sites, AI models extract knowledge from them, often without giving anything back. If a chatbot can answer a question using information scraped from a website, why would a user ever need to visit the original source?

This creates an incentive for web owners to resist. Blocking AI scrapers is one option, but tarpitting is a more aggressive form of resistance – one that doesn’t just say “no” but actively drains the resources of the scraper itself.

The Future of the Bot War

For now, tarpits and similar techniques are a minor inconvenience for AI companies. But if more websites deploy them, AI scrapers may need to evolve. They’ll have to become more sophisticated – better at recognizing when they’re being lured into a trap, more selective about where they pull data from, and possibly more transparent about their operations.

At some point, AI companies may have to acknowledge that their current approach to data collection isn’t sustainable. Right now, they operate on a model of scrape first, ask questions later – but if enough websites fight back, they may be forced to rethink their methods.

Or, more likely, they’ll just build better bots as the game of cat and mouse continues.