Wikimedia Struggles with AI Crawlers Eating Up Bandwidth

Wikipedia in Ghibli Style

Wikipedia’s parent organization, the Wikimedia Foundation, is facing a growing problem.

Since January 2024, their bandwidth for multimedia files has jumped by 50% – not because more people are reading articles, but because AI Models are constantly scraping their content.

The Battle for Bandwidth: Humans vs. AI

In an open letter, Wikimedia staff explained that their systems were built to handle traffic spikes from human users during big events. But the constant, heavy traffic from AI crawlers is something they weren’t prepared for.

These AI programs are scraping Wikimedia’s free image library to train artificial intelligence models. According to Wikimedia’s data:

  • AI crawlers generate 65% of high-cost content traffic.

Why is this a problem?

When humans browse Wikipedia, they usually look at popular content that Wikimedia has already distributed to data centers around the world.

But AI crawlers don’t care about popularity – they grab everything, including rarely-viewed content that must be pulled from central servers, using more computing resources.

A Growing Industry Problem

Wikimedia isn’t alone. Other organizations like Sourcehut (a Git hosting service), iFixit (a repair website), and ReadTheDocs have all raised concerns about aggressive AI crawlers.

The issue has gotten so serious that Wikimedia’s plan for 2025/2026 specifically aims to:

  • Reduce crawler request rates by 20%.
  • Cut crawler bandwidth usage by 30%.

Their goal is simple: prioritize human users and support their volunteer contributors.

Fighting Back: Can Websites Survive the AI Hunger?

While many websites accept that providing data for crawlers is part of being online, the rise of AI tools like ChatGPT has made crawler activity much more aggressive. This heavy scraping could eventually threaten the very websites providing the content.

Some new tools have emerged to fight back against excessive crawling:

  • Data poisoning projects like Glaze, Nightshade, and ArtShield
  • Web protection tools such as Kudurru and Nepenthes

Unfortunately, traditional methods of controlling web crawlers (like robots.txt files) aren’t always effective, especially since some AI crawlers disguise themselves to avoid being blocked.

As AI development continues to accelerate, finding a balance that allows both human users and AI systems to access open knowledge resources will remain a critical challenge.

Share On:

Leave a Comment