Skip to main content
Joseph HE
Software Engineer
View all authors

The Great Internet Outage of June 12, 2025 - A Lesson in Digital Fragility

· 4 min read
Joseph HE
Software Engineer

Thursday, June 12, 2025, will be etched in the annals as a day when the internet revealed its flaws. A widespread outage affected a wide range of popular services and websites, revealing the intrinsic vulnerability of a digital ecosystem increasingly reliant on a limited number of hosting giants.

The Fragility of Our Digital Ecosystem

This outage brutally highlighted how much our daily internet access relies on a handful of major players. As Tim Marcin of Mashable pointed out, this incident "paints a picture of the fragility of our internet ecosystem when essential cogs malfunction." It's clear that many commonly used services depend on a small number of large providers, and a malfunction at one of them can have significant cascading repercussions.

The names that repeatedly surface are well-known: AWS (Amazon Web Services), Google Cloud, Azure (Microsoft), and Cloudflare. The June 12 outage primarily involved Google Cloud and Cloudflare, demonstrating an interdependence that surprised even industry experts.

Google Cloud at the Heart of the Storm

At the center of this interruption was a problem with Google Cloud Platform (GCP). Google quickly acknowledged "problems with its API management system." Thomas Kurian, CEO of Google Cloud, issued an apology, confirming a full restoration of services.

What emerged from this situation was an unsuspected reliance of Cloudflare on Google Cloud. Long perceived as having an entirely independent infrastructure, Cloudflare revealed that some of its key services relied on GCP, particularly for a "long-term cold storage solution" linked to its Worker KV service. Initially, Cloudflare attributed the fault to Google Cloud, stating it was a "Google Cloud outage" affecting a limited number of its services.

The Cascading Impact of Cloudflare Worker KV

The Cloudflare Worker KV (Key-Value) service proved to be Cloudflare's Achilles' heel. Described as a "key-value store" and a "heart for tons of other things," its failure led to a cascade of incidents.

The outage lasted 2 hours and 28 minutes, globally impacting all Cloudflare customers using the affected services, including Worker KV, Warp, Access Gateway, Images, Stream, Workers AI, and even the Cloudflare dashboard itself. This situation clearly demonstrated that Worker KV is a "critical dependency for many Cloudflare products and is used for configuration, authentication, and asset delivery."

Transparency and Accountability: The Cloudflare Example

A remarkable aspect of this incident was Cloudflare's reaction in terms of transparency and taking responsibility. Although the root cause was attributed to Google Cloud, Cloudflare released an incident report with rare candor. Dane, Cloudflare's CEO, stated: "We let our customers down at Cloudflare today. [...] This was a failure on our part, and while the immediate cause or trigger of this outage was a third-party vendor failure, we are ultimately responsible for our chosen dependencies and how we choose to architect around them."

This attitude was widely praised as a corporate model, showing a "willingness to share absurdly high error rates" and the absence of "blame towards Google" in their report, proving a strong commitment to transparency.

Lessons Learned and Future Mitigation

Cloudflare quickly identified and began working on solutions. The incident report details a rapid timeline of detection and classification of the incident at the highest severity level (P0). The company plans to strengthen the resilience of its services by reducing single dependencies, notably by migrating Worker KV's cold storage to R2, their S3 alternative, to avoid relying on third-party storage infrastructures.

They are also working to "implement tools that allow them to gradually reactivate namespaces during storage infrastructure incidents," ensuring that critical services can operate even if the entire KV service is not yet fully restored.

The June 12, 2025 outage served as a brutal reminder of the web's increasing interdependence and the crucial importance of redundancy and diversification of dependencies, even for hosting giants. It compels us to re-evaluate the resilience of our digital architectures and strengthen collaboration among stakeholders for a more robust internet.

source:https://mashable.com/article/cause-internet-outage-google-cloud-what-happened-june-12

Stack Overflow's Demise

· 5 min read
Joseph HE
Software Engineer

The Quiet Demise of Stack Overflow: More Than Just an AI Story

Remember Stack Overflow? For over a decade, it was the undisputed digital cathedral for developers, the first tab you opened when a coding problem stumped you. It was the collective brain of the programming world, a place where answers were forged through community wisdom and rigorous peer review.

But new data and a compelling analysis suggest this titan of tech support is quietly, perhaps even rapidly, fading into irrelevance. And while large language models (LLMs) like ChatGPT have undeniably played a role in its recent struggles, a deeper dive reveals a more complex truth: Stack Overflow was already on a downward spiral, a trajectory set by its own internal decisions and culture, long before AI became a mainstream threat.

The Numbers Don't Lie: A Dwindling Community

The most glaring evidence of Stack Overflow's decline is the dramatic drop in question volume. A chilling graph highlights a significant decrease, starting as early as 2014, and then accelerating sharply after the launch of ChatGPT.

The data is stark: "the volume of questions posed has almost dried up." In fact, the monthly question count is now "as low as at Stack Overflow's launch in 2009." As one observed, "whoa, that's crazy, it's so crazy," to see fewer questions today than when they first started programming. This isn't just a dip; it's a plunge.

ChatGPT: The Accelerator, Not the Sole Cause

There's no denying the immediate impact of LLMs. As soon as ChatGPT burst onto the scene, Stack Overflow's question volume plummeted. Why? Because tools like ChatGPT offer swift, polite, and eerily accurate answers. They're trained on vast datasets, "including potentially the content of Stack Overflow," providing similar quality but with a far more agreeable user experience. Unlike Stack Overflow's moderators, "ChatGPT is polite and answers all questions." It's the ultimate low-friction, high-reward information source for many developers.

The Self-Inflicted Wounds: Culture and Missed Opportunities

But let's be clear: ChatGPT wasn't the primary cause of the initial rot. The analysis strongly argues that Stack Overflow committed fundamental strategic and cultural errors well before AI entered the picture.

1. A Culture of "Toxic Gatekeeping": The site's moderation culture is described as overtly "toxic" and a breeding ground for "gatekeeping." Moderators were often perceived as aggressive, quick to close legitimate questions, even those offering valuable insights or aiding understanding. One user lamented, "Stack Overflow was a product people generally didn't like, it was more that they just had to be there." Another insightfully noted, "I stopped asking questions at that time because the site felt unwelcoming." This unwelcoming atmosphere, ironically, appears to have coincided with the start of the decline. In 2014, when "Stack Overflow significantly improved moderator efficiency," questions began to drop. More efficient moderation, it seems, meant more questions closed, alienating a large segment of its user base.

2. A Glaring Lack of Innovation (Integration is King): Perhaps the most staggering oversight was Stack Overflow's failure to innovate where it mattered most: direct integration. The document highlights a crucial missed opportunity: why did Stack Overflow never develop an official plugin for popular Integrated Development Environments (IDEs) like VS Code?

As the author points out, "They should have had this Stack Overflow plugin from, like, 2017, 2018. Why wouldn't they do that?" Developers live in their IDEs, and instant access to Stack Overflow's vast knowledge base directly within their workflow would have been invaluable. "Integration is king," and Stack Overflow simply failed to build the bridges necessary to stay relevant in the evolving developer ecosystem.

The Unseen Cost: Data and the Perfect Exit

There's also a sense of injustice expressed regarding the data. The author argues that LLMs like OpenAI's and Anthropic's models "likely stole everything" from Stack Overflow, which possessed "the richest training data ever existing for coding." This raises questions about compensation and fair use in the age of AI.

Amidst this unfolding drama, a nod must be given to Stack Overflow's founders, Jeff Atwood and Joel Spolsky. They sold the company for a whopping $1.8 billion in 2020. In retrospect, this timing was "nearly perfect," occurring just before the terminal decline became acutely apparent.

Where Do Developers Go Now? The Future of Community

So, if not Stack Overflow, then where? The analysis suggests that developers are already migrating to other platforms for help and community. "Discord servers are probably one of the biggest things right now," notes the author. Other spaces like WhatsApp and Telegram groups are also filling the void, indicating a shift towards more immediate, less formal, and often more welcoming interactions.

The Verdict: Self-Inflicted Irrelevance

Ultimately, the analysis points to a sobering truth: Stack Overflow largely authored its own decline. Its internal culture, rigid moderation policies, and critical lack of strategic innovation made it ripe for disruption. The advent of LLMs simply accelerated an inevitable process. As the author concludes, "I wouldn't say 'unfortunately,' because Stack Overflow, ultimately, Stack Overflow was making itself irrelevant."

The quiet demise of Stack Overflow serves as a cautionary tale: even established giants in the tech world are not immune to decline if they fail to adapt, innovate, and cultivate a truly welcoming community. In the rapidly evolving landscape of software development, relevance is earned, not given, and it can be lost as quickly as it was gained.

Builder AI - The "Biggest AI Scam"? Behind the Algorithm, 700 Human Engineers

· 5 min read
Joseph HE
Software Engineer

The world of tech startups is often filled with grandiose promises, but sometimes, reality is far more down-to-earth, even shocking. The Builder AI case is a striking example. This "no-code" development startup, which had managed to raise hundreds of millions of dollars and attract the support of giants like Microsoft, recently made headlines for very bad reasons. The revelation? Its flagship platform, supposedly revolutionary and powered by an AI named Natasha, was in fact... manual work carried out by 700 human engineers based in India.

This is a story that raises serious questions about the overstatement of AI capabilities in the startup ecosystem, dubious financial practices, and the increasingly blurred line between human-assisted automation and true artificial intelligence.

The Scam at the Heart of Builder AI: Natasha, the AI that wasn't

The central idea of the case is simple: Builder AI marketed a product by presenting it as an artificial intelligence marvel, when behind the scenes, client requests were handled by an army of humans. The source even goes so far as to call it the "biggest scam in the history of AI."

The promise? A platform capable of assembling software applications "like Lego bricks" thanks to an AI assistant called Natasha. The reality? "Natasha neural network turned out to be 700 Indian programmers." Each client request was sent to an office in India, where these 700 engineers wrote the code by hand. This is "absolutely incredible," as the author points out.

When Human Work Masquerades as AI: A Recurring Pattern?

Unfortunately, this is not an isolated case. The source emphasizes that this practice of masking cheap human labor behind an AI veneer is not new. Companies have been seen claiming AI capabilities when they relied on "a group of Indians that they hire on the back end and they call it and they call it AI."

This even opens up a reflection on complexity: did these Indian engineers themselves use AI tools to "prompt" and maintain the pace? The line between "AI-powered" and "human-assisted by AI" becomes dangerously porous.

Quality Sacrificed on the Altar of Deception

Despite the use of 700 engineers, the results were far from satisfactory. The delivered products were "buggy, dysfunctional and difficult to maintain." The code was described as "unreadable" and the functions "did not work." A biting irony when one claims to deliver innovation through AI. "Nice okay everything was real artificial intelligence except the uh except that none of it was," the source comments sarcastically.

The Financial Fall: 445 Million Dollars Gone

Thanks to this deception, Builder AI managed to attract 445 million dollars in investments over eight years, with prestigious names like Microsoft on its honor roll. But the house of cards did not withstand. The fall was brutal: a default on payment to the creditor Viola Credit, which seized 37 million dollars from the company's accounts, paralyzed its operations. Additional funds in India remained blocked by regulatory restrictions.

After the exposure of the deception, the startup officially went bankrupt. It's an "absolutely ridiculous" end for a company that purported to be at the forefront of technology.

The "Endgame" of AI Scams: "Fake It Till You Make It" Taken to the Extreme?

Why such an undertaking? What motivates founders to embark on such a path? Is it simply to "ride the hype" of AI and "embezzle money"? The source questions the intention.

One hypothesis is that it was initially a different product that mutated. The founders might have believed they could use developers as a "stop gap" while waiting to develop a true AI, but failed to achieve that goal. This is "fake it till you make it" pushed to its extreme, with disastrous consequences.

AI Must "Multiply Roles," Not "Replace" Them

The author of the source expresses deep skepticism towards AI companies that boast of being able to "replace all engineers." He suggests that a healthier and more realistic approach for AI is to build tools that "multiply the roles" of engineers, making them more efficient or simplifying their work, rather than seeking to eliminate them.

"Fully working independent AI sucks," he concludes, arguing that we should have understood after "3 years" that total autonomous AI is less effective than AI that assists humans.

A Connection with Versailles Innovations

Amidst this debacle, the name Versailles Innovations surfaced due to its commercial association with Builder AI from 2021. The co-founder of Versailles, who was also the former managing director of Facebook in India, denied any financial wrongdoing or irregularities in transactions with Builder AI, calling the allegations "absolutely baseless and false."

The Builder AI case is a brutal reminder of the dangers of "vaporware" and excessive "hype" around AI, especially when colossal sums are at stake. It underscores that the complete replacement of human labor by AI is still a fantasy, and that the most promising AI tools are those that augment human capabilities, rather than those that secretly claim to annihilate them. It's a costly lesson for investors and a warning for the entire tech sector.

The Hidden Dangers of C - Unpacking Memory Management Risks

· 5 min read
Joseph HE
Software Engineer

The C programming language. It's often hailed as the "mother of almost all modern languages," forming the bedrock of everything from operating systems and compilers to game engines and encryption tools. Its power and low-level control are unparalleled, making it indispensable for critical infrastructure. Yet, this very power comes with a demanding responsibility: manual memory management.

Unlike languages with automatic garbage collection, C forces developers to "grow up and manage memory by yourself." This means allocating memory with malloc and diligently freeing it with free once it's no longer needed. This seemingly simple contract between malloc and free hides a minefield of potential pitfalls. Mishandling this responsibility can lead to catastrophic security vulnerabilities and system instability, often manifesting as "undefined behavior" – a programmer's nightmare where anything, from a minor glitch to complete system compromise, can happen.

Let's delve into some of the most common and dangerous memory management errors in C, illuminated by infamous historical incidents.

The Perils of C: Common Memory Management Risks

1. Buffer Overflows: When Data Spills Over

A buffer overflow occurs when a program attempts to write more data into a fixed-size buffer than it was allocated to hold. C, by design, doesn't perform automatic bounds checking. This lack of a safety net means if you write past the end of an array or buffer, you can overwrite adjacent data in memory, including critical program instructions or return addresses on the stack.

The consequences are severe: undefined behavior, program crashes, or, most dangerously, arbitrary code execution. A classic example is the Morris Worm of 1988. This early internet scourge exploited buffer overflows in common UNIX utilities like Fingered and Sendmail to inject malicious code, infecting an estimated 10% of the internet at the time. A simple conditional check on input size could have prevented this widespread chaos.

2. Heartbleed: A Lesson in Missing Length Checks

While a specific type of buffer overflow, the Heartbleed vulnerability (2014) in OpenSSL's heartbeat extension perfectly illustrates the danger of missing length validations. The server was designed to echo back a client's "heartbeat" message. The client would declare a certain message length and then send the data. The flaw? The server code didn't verify that the actual length of the received message matched the declared length.

Attackers could send a tiny message (e.g., "hello") but declare it as 64,000 bytes long. The server, trusting the declared length, would then read and return 64,000 bytes from its own memory, including the "hello" message plus an additional 63,995 bytes of whatever was immediately following the message in memory. This allowed attackers to passively leak sensitive data like private encryption keys, usernames, and passwords, impacting vast swathes of the internet.

3. Use-After-Free: Accessing Ghost Memory

This vulnerability arises when a program attempts to access a block of memory after it has been freed using free(). Once memory is freed, the operating system can reallocate it for other purposes. If a pointer still points to this now-freed (and potentially reallocated) memory, accessing it can lead to:

  • Crashes: If the memory has been reallocated and its contents changed, accessing it can cause the program to crash.
  • Data Corruption: Writing to reallocated memory can corrupt other parts of the program or even other programs.
  • Arbitrary Code Execution: An attacker might intentionally trigger a use-after-free, cause the memory to be reallocated with malicious data, and then exploit the old pointer to execute their own code.

The Internet Explorer 8 vulnerability (2013) demonstrated this. It involved JavaScript deleting HTML elements, but a pointer to the freed object persisted. An attacker could then craft a malicious webpage that would trigger the use-after-free, leading to system compromise by simply visiting the site.

4. Off-By-One Errors: The Tiny Miscalculation with Big Impact

Off-by-one errors are subtle mistakes in calculation, often involving loop boundaries or array indexing. In C, a common manifestation is forgetting to account for the null-terminating character (\0) when allocating space for strings. For instance, if you need to store a 10-character string, you actually need 11 bytes (10 for characters + 1 for \0).

These seemingly minor errors can lead to buffer overflows (writing one byte past the allocated end) or other out-of-bounds accesses, causing unpredictable behavior or opening doors for exploitation.

5. Double Free: Freeing What's Already Gone

Calling free() twice on the same block of memory is a "double free." This leads to immediate undefined behavior and can seriously corrupt the internal data structures used by the memory allocator (like malloc and free).

The implications are dire:

  • Program Crash: The program might crash immediately due to memory corruption.
  • Heap Corruption: The memory manager's internal state can become inconsistent, leading to unpredictable behavior later.
  • Arbitrary Code Execution: A sophisticated attacker can often manipulate the heap structures through a double free to achieve arbitrary read/write primitives, ultimately leading to remote code execution. When your code enters undefined behavior territory, "all bets are off."

Conclusion: The Unpredictable Nature of Undefined Behavior

The common thread running through these memory management errors is "undefined behavior." When your C code exhibits undefined behavior, the compiler and runtime environment are free to do anything. Your program might appear to work, it might crash, or, most terrifyingly, it might create a subtle vulnerability that an attacker can meticulously exploit to gain control of your system.

C's power is undeniable, but it comes with a non-negotiable demand for meticulousness in memory management. The historical incidents highlighted here serve as stark reminders that even a single oversight in handling malloc and free can have devastating, real-world consequences. Secure C programming isn't just about writing correct code; it's about anticipating and preventing every possible way memory can be mismanaged.

Why Did Facebook (Meta) Say "No" to Git? A Story of Scaling, Community, and Giant Monorepos

· 6 min read
Joseph HE
Software Engineer

Why Did Facebook (Meta) Say "No" to Git? A Story of Scaling, Community, and Giant Monorepos

In the world of software development, Git is ubiquitous. It's the default tool for millions of developers and projects, almost a given, "as common as water," as our source's author points out. It's perceived as the only viable solution for managing code. So imagine the surprise of discovering that Facebook (now Meta), one of the world's largest tech companies, does not use Git as its primary version control system for its immense monorepos.

This is a fascinating story that highlights engineering challenges on a colossal scale, the limits of popular tools, and the crucial importance of human factors in technological decisions. Let's delve into the reasons why Meta chose a different path.

The Astonishing Absence of Git at Meta

For many, the idea that Facebook doesn't run on Git is counter-intuitive. The author, whose personal experience with version control systems began with SVN before the explosion of Git, confesses his own surprise: "Throughout my life a git was common as water it was so common in fact that I assumed it was the only viable tool for creating and managing code changes". He recounts how the Facebook engineers he met were "deeply trained on material patterns and Facebook stack diffs workflow" rather than on Git.

Historically, even Google, whose engineering "predates git by over 5 years," uses its own internal system. But for Facebook, it was a more active and recent decision.

The Myth of Git's Complexity (and why that wasn't the main reason)

Before addressing the real reasons, it's worth noting that the perceived "difficulty" of Git was not the driving force behind this decision. The author himself wonders: "I've never understood this kind of commenting get is so confusing how is G confusing like what about git is confusing". He often attributes this confusion to a lack of fundamental learning, suggesting that "most of you have just never taking taken the two hours of time it takes to learn get well enough to not be confused by any of it."

No, the reason for Facebook's shift was far deeper and more technical.

The Scaling Nightmare in 2012: When Git Reached Its Limits

The real breaking point occurred around 2012. By then, Facebook's codebase was already "many times larger than even the Linux kernel" (which had 17 million lines and 44,000 files). With exponential growth, Git began to show significant signs of weakness for operations on such a gigantic monorepo.

The key bottleneck? The process of "statting" (checking the status) of all files. "G examines every file and naturally becomes slower and slower as the number of files increase." Basic Git operations, far from being "crippling slow," were slow enough to warrant a thorough investigation. Simulations were "horrifying," showing that simple Git commands could take "over 45 minutes to complete" as the codebase continued to grow. This was untenable for thousands of engineers.

The Call for Help and the Surprising Response from Git Maintainers

Faced with these challenges, the Facebook team did what many tech companies would have done: they contacted the Git maintainers. Their goal was to collaborate to extend Git and better support large monorepos.

However, the response was unexpected and, according to the author, "wasn't cooperative." The Git maintainers "pushed back on improving performance and instead recommended that Facebook shared the uh Shard their monor repo" (divide their monorepo into multiple repositories).

This suggestion, though technically possible, was a "non-starter" for Facebook. They had invested heavily in a monorepo workflow, and the complexity of such fragmentation would have been enormous. Even more surprisingly, Facebook expected their offer of "free open source labor by a major tech company is well received," an opportunity to improve a widely used open-source project. The lack of cooperation was a decisive factor.

Mercurial: The Unexpected Alternative and Its Clean Architecture

Faced with Git's limitations and the lack of support for massive monorepos, Facebook explored alternatives. In 2012, options were "scarce." Perforce was dismissed due to perceived architectural flaws. This is where Mercurial entered the scene.

Mercurial had performance "similar to git," but possessed a much cleaner architecture. While Git was a "complex web of bash and C code," Mercurial was "engineered in Python using object-oriented code patterns and was designed to be extensible." This extensibility was crucial.

The team decided to attend a Mercurial hackathon in Amsterdam. What they discovered was not just a flexible system, but also "a community of maintainers who were impressively welcoming to aggressive changes by the Facebook team." This was the perfect contrast to their previous experience.

The Internal Migration: A Masterclass in Change Management

Convincing the entire engineering organization to migrate from Git to Mercurial was an "intimidating" task. Engineers can be "extremely sensitive about tooling changes." Yet, what followed "sounds like a masterclass in internal Dev tools migrations."

The team methodically:

  1. Socialized the idea: Communicated the necessity and benefits.
  2. Documented workflows: Ensured everyone knew how to use the new tool.
  3. Listened to concerns: Allowed developers to express their doubts.
  4. Made the definitive switch: Cut the cord with Git once the groundwork was laid.

The success of this massive migration is also attributed, with a hint of irony, to the fact that few Facebook engineers knew Git in depth. As the author notes, "it's not even a big deal" to change tools if engineers aren't attached to specific Git subtleties.

The Legacy of Facebook's Decision: Stack Diffs and an Improved Mercurial

Facebook's decision was not without consequences for the open-source ecosystem:

  • Improved Mercurial: Facebook "contributed performance improvements to Mercurial making it the best option for large monor repos."
  • "Stack Diffs": Building on Mercurial's concepts, Facebook created an innovative code review workflow called "stack diffs" (or stacked diffs). This "unlocking novel code review parall parallelization" and revolutionized their development process. Former Facebook engineers exported this workflow to other companies, creating a "small but vocal Cult of Stack diff Enthusiast," even inspiring the author to create tools like Graphite.

The Human Factor and the Constant Evolution of Technology

Ultimately, the story of Facebook and Git is a poignant reminder that "so many of History's key technical decisions are human driven not technology driven." The receptiveness of a community, the adaptability of a team, and the ability to collaborate can outweigh perceived technical advantages.

It's also crucial to note that the landscape has evolved. "A decade later GI has made significant improvements to support monor repos... today get now with some knowledge of how to do it operates well with really really large repos now." Git has progressed, and it's possible that it could now handle Facebook's needs.

Facebook's story is one of a company that had to adapt to explosive growth. Faced with the performance limitations of a dominant tool, and a community that was not ready to support its specific needs at the time, they made a pragmatic choice. It was not a rejection of Git in itself, but a response to a unique scaling problem, resolved with an innovative solution, and a testament to the power of human decisions in large-scale engineering.