Skip to main content

One post tagged with "cloudflare"

View all tags

The Great Internet Outage of June 12, 2025 - A Lesson in Digital Fragility

· 4 min read
Joseph HE
Software Engineer

Thursday, June 12, 2025, will be etched in the annals as a day when the internet revealed its flaws. A widespread outage affected a wide range of popular services and websites, revealing the intrinsic vulnerability of a digital ecosystem increasingly reliant on a limited number of hosting giants.

The Fragility of Our Digital Ecosystem

This outage brutally highlighted how much our daily internet access relies on a handful of major players. As Tim Marcin of Mashable pointed out, this incident "paints a picture of the fragility of our internet ecosystem when essential cogs malfunction." It's clear that many commonly used services depend on a small number of large providers, and a malfunction at one of them can have significant cascading repercussions.

The names that repeatedly surface are well-known: AWS (Amazon Web Services), Google Cloud, Azure (Microsoft), and Cloudflare. The June 12 outage primarily involved Google Cloud and Cloudflare, demonstrating an interdependence that surprised even industry experts.

Google Cloud at the Heart of the Storm

At the center of this interruption was a problem with Google Cloud Platform (GCP). Google quickly acknowledged "problems with its API management system." Thomas Kurian, CEO of Google Cloud, issued an apology, confirming a full restoration of services.

What emerged from this situation was an unsuspected reliance of Cloudflare on Google Cloud. Long perceived as having an entirely independent infrastructure, Cloudflare revealed that some of its key services relied on GCP, particularly for a "long-term cold storage solution" linked to its Worker KV service. Initially, Cloudflare attributed the fault to Google Cloud, stating it was a "Google Cloud outage" affecting a limited number of its services.

The Cascading Impact of Cloudflare Worker KV

The Cloudflare Worker KV (Key-Value) service proved to be Cloudflare's Achilles' heel. Described as a "key-value store" and a "heart for tons of other things," its failure led to a cascade of incidents.

The outage lasted 2 hours and 28 minutes, globally impacting all Cloudflare customers using the affected services, including Worker KV, Warp, Access Gateway, Images, Stream, Workers AI, and even the Cloudflare dashboard itself. This situation clearly demonstrated that Worker KV is a "critical dependency for many Cloudflare products and is used for configuration, authentication, and asset delivery."

Transparency and Accountability: The Cloudflare Example

A remarkable aspect of this incident was Cloudflare's reaction in terms of transparency and taking responsibility. Although the root cause was attributed to Google Cloud, Cloudflare released an incident report with rare candor. Dane, Cloudflare's CEO, stated: "We let our customers down at Cloudflare today. [...] This was a failure on our part, and while the immediate cause or trigger of this outage was a third-party vendor failure, we are ultimately responsible for our chosen dependencies and how we choose to architect around them."

This attitude was widely praised as a corporate model, showing a "willingness to share absurdly high error rates" and the absence of "blame towards Google" in their report, proving a strong commitment to transparency.

Lessons Learned and Future Mitigation

Cloudflare quickly identified and began working on solutions. The incident report details a rapid timeline of detection and classification of the incident at the highest severity level (P0). The company plans to strengthen the resilience of its services by reducing single dependencies, notably by migrating Worker KV's cold storage to R2, their S3 alternative, to avoid relying on third-party storage infrastructures.

They are also working to "implement tools that allow them to gradually reactivate namespaces during storage infrastructure incidents," ensuring that critical services can operate even if the entire KV service is not yet fully restored.

The June 12, 2025 outage served as a brutal reminder of the web's increasing interdependence and the crucial importance of redundancy and diversification of dependencies, even for hosting giants. It compels us to re-evaluate the resilience of our digital architectures and strengthen collaboration among stakeholders for a more robust internet.

source:https://mashable.com/article/cause-internet-outage-google-cloud-what-happened-june-12