A major outage at Cloudflare on November 18 disrupted access to nearly 20 percent of the web, all due to a flawed update in its bot detection system.
Quick Summary – TLDR:
- Cloudflare outage affected up to 20 percent of internet traffic, disrupting major websites including crypto platforms and tech services.
- The root cause was a misconfigured database query that generated oversized bot detection files, breaking core proxy services.
- Cloudflare initially feared a cyberattack but later confirmed no malicious activity was involved.
- Full service was restored by 17:06 UTC, after manual intervention and rollback of the bot configuration system.
What Happened?
On Tuesday, November 18, Cloudflare’s network suffered a critical outage that affected a large portion of the internet. Users visiting impacted websites were met with error messages caused by internal failures in Cloudflare’s system. Initial suspicion pointed to a DDoS attack, but the root cause turned out to be far more mundane and entirely self-inflicted.
The issue was tied to a database permissions change that resulted in the generation of faulty configuration data for Cloudflare’s Bot Management system, which then cascaded into a widespread service failure.
September, 26: Cloudflare rewritten in “memory safe” Rust.
— The Lunduke Journal (@LundukeJournal) November 19, 2025
The change is touted as “faster and more secure” because of Rust.https://t.co/Mpy43z5S8A
November, 18 (53 days later): Cloudflare has a massive outage, which took down large portions of the Internet, because of a… pic.twitter.com/vLrAlH4wVy
The Bot Breakdown That Broke the Web
Cloudflare, which manages around 20 percent of all internet traffic and supports a third of the top 10,000 websites, experienced a cascading failure due to a flawed update in its bot detection system. Here’s how it unfolded:
- A change to ClickHouse database permissions mistakenly caused the system to produce a bot detection “feature file” that was twice its expected size.
- This feature file, which helps distinguish real users from bots, was pushed across Cloudflare’s entire network.
- The core proxy system responsible for routing internet traffic couldn’t handle the bloated file and began crashing.
- To make matters worse, the file was updated every five minutes. Some updates were valid, while others were faulty, causing the network to repeatedly fail and recover for nearly two hours.
- Once all ClickHouse nodes began producing faulty files, the system entered a persistent failure state, severely affecting service availability.
Widespread Impact Across the Internet
This was Cloudflare’s worst outage since 2019, affecting services globally. Notable disruptions included:
- Coinbase, Blockchain.com, Ledger, BitMEX, Toncoin, Arbiscan, DefiLlama, and even platforms like ChatGPT and X.
- Authentication systems like Cloudflare Access failed, preventing users from logging in.
- Turnstile, Cloudflare’s CAPTCHA service, went offline, blocking dashboard access.
- Workers KV, a key storage backend, returned 5xx errors and required emergency bypasses.
Cloudflare noted elevated CPU usage and debugging errors during the event, further slowing recovery efforts. The outage also sparked criticism from the crypto community about the centralization risks of relying on major infrastructure providers.
A spokesperson for EthStorage, which offers decentralized web solutions, said the incident shows that “centralized infrastructure will always create single points of failure.” The company called for a more robust decentralized web stack.
Recovery and Postmortem
By 13:05 UTC, Cloudflare engineers began implementing bypasses for critical services like Workers KV and Access. Shortly after, they identified the faulty configuration file as the source of the problem.
At 14:30 UTC, a known good version of the feature file was manually reinserted into the system and the core proxy service was restarted. By 17:06 UTC, all systems were confirmed to be back to normal.
Cloudflare has since outlined several measures to prevent a repeat of this scenario:
- Stricter validation of internal configuration files.
- Kill switches for faulty feature propagation.
- Better handling of system failures to avoid resource overuse.
SQ Magazine Takeaway
Honestly, this was a rough one. When a company like Cloudflare, which underpins a huge slice of the internet, suffers a slip-up like this, it reminds us just how fragile the web can be behind the scenes. It’s not hackers or bad actors, it was just a bad query and a bloated file that brought a chunk of the internet to its knees. I think this incident is a wake-up call. Infrastructure providers have to treat internal code changes with the same scrutiny as any customer-facing update. There’s no such thing as “small” in systems this big.

