Whereas last week's outage was traced back to Verizon, this seems to be a Cloudflare issue, as the Cloudflare status page reveals that the company is investigating network performance issues impacting countless sites and services. Team is working on getting to the bottom of what's going on.
At the time of posting, most websites that use Cloudflare are now up and running. Even so, the incident should be viewed as a lesson regarding how much the internet depends on a handful of major companies such as this one.
All Cloudflare services have now been restored. Graham-Cumming would only say he would be "discussing with the sales leadership and with the rest of the team, how we best address this" before repeating that Cloudflare "felt this pain very, very severely". After some internal dog-fooding, the updates are pushed out to a small group of customers "who tend to be a little bit cheeky with us" and "do naughty things" before it is progressively rolled out to the wider world.
Today's outage is the latest a number of issues in recent weeks, most recently on June 24th, when a similar outage again took many sites offline.
"Starting at 1342 UTC today we experienced a global outage across our network that resulted in visitors to Cloudflare-proxied domains being shown 502 errors ("Bad Gateway")".
Cloudflare has confirmed that the outage was caused by deployment of a bugged software which was reversed.
The networking problem effectively took down websites across the world, particularly in the United Kingdom, but also much of Europe and both the east and west coast of America, according to DownDetector, which was also affected.
Cloudflare chief technology officer John Graham-Cumming posted an initial brief post-mortem about the outage, saying the company is "incredibly sorry that this incident occurred".
"We make software deployments constantly across the network and have automated systems to run test suites, and a procedure for deploying progressively to prevent incidents".
23 minutes after Cloudflare confirmed that it was experiencing issues, it announced that it had "implemented a fix".