Don't Be the Next Instapaper
February 21, 2017

Michelle McLean
ScaleArc

Share this

Instapaper, a "read later" tool for saving web pages to read on other devices or offline, suffered an extensive outage 2 weeks ago. The site was unavailable for a day and a half, and even after restoring service, the company had to explain that its archives would be impacted for another full week. Ultimately, it was able to restore the archives sooner, but the outage garnered extensive press and social media coverage.

The cause of the outage was that an indexing file Instapaper relies on for reaching all stored links exceeded the max file size supported on the older instance of Amazon Web Services the site was first built on. You can read if you want more details .

While Instapaper hit a unique problem — a file size limitation — its experience speaks to a much larger problem: scaling a database is difficult, and never quick. That basic fact explains why outages like the one Instapaper suffered are surprisingly common.

Engineering a scaled database — and then performing the application changes needed to take advantage of that scaled out database — is tough coding work indeed. We encounter companies with full control of their source code who are petrified to make the changes needed to scale database capacity. Perhaps it's an ecommerce app, and it's too close to Black Friday. Or maybe it's just a case of attrition: the folks who really understand that code base are long gone, and the current engineers don't dare mess with the interworkings of the app.

These kinds of meltdowns are common during surge events, like the one ESPN suffered with the launch of Fantasy Football or the one Macy's suffered last Black Friday. Sometimes customers can see these events coming (e.g., they're expecting a major traffic surge on Black Friday) and sometimes they simply don't (e.g., their product gets a nod from a celebrity and all of a sudden they're swamped).

When a traffic surge takes down your site, it usually means the data tier was already fragile. Scaling the web infrastructure is pretty easy, as is scaling internet capacity. But scaling the data tier itself is where the challenges lie.

The Instapaper crisis also illustrates how the cloud alone doesn't solve the challenge of scaling the data tier. While elasticity is a hallmark of cloud services, the physics around having an application talk to multiple instances of a database remains a challenge. We've seen some customers suffer from an inflated sense of confidence that running in the cloud takes away these difficulties.

Don't wait for disaster to strike. Whether you're running on prem or in the cloud, keep a close eye on all metrics that reveal how "hot" your systems are running. Ensure your disaster recovery plan is robust — and recently tested. Better yet, don't rely on disaster recovery. Instead, run in active/active mode, where you've got multiple instances of all critical systems running in different locales, with the systems able to take on the full load if one portion fails.

Take steps now to scale your data tier and avoid these kinds of catastrophic outages. Those "Here's why we failed" engineering blog entries are no fun to write.

Michelle McLean is VP of Marketing at ScaleArc.

Share this

The Latest

April 26, 2018

The growing urgency of enterprises to digitally transform their business operations and enhance customer experience was the driving force behind much of the growth in outsourcing innovation, contract awards and spending in 2017, according to the ISG Momentum Annual Report ...

April 25, 2018

Organizations are embracing digital transformation, as 89% have plans to adopt or have already adopted a digital-first business strategy, according to the 2018 IDG Digital Business Survey ...

April 24, 2018

Managing emerging technologies such as Cloud, microservices and containers and SDx are driving organizations to redefine their IT monitoring strategies, according to a new study titled 17 Areas Shaping the Information Technology Operations Market in 2018 from Digital Enterprise Journal (DEJ) ...

April 23, 2018

Balancing digital innovation with security is critical to helping businesses deliver strong digital experiences, influencing factors such maintaining a competitive edge, customer satisfaction, customer trust, and risk mitigation. But some businesses struggle to meet that balance according to new data ...

April 19, 2018

In the course of researching, documenting and advising on user experience management needs and directions for more than a decade, I've found myself waging a quiet (and sometimes not so quiet) war with several industry assumptions. Chief among these is the notion that user experience management (UEM) is purely a subset of application performance management (APM). This APM-centricity misses some of UEM's most critical value points, and in a basic sense fails to recognize what UEM is truly about ...

April 18, 2018

We now live in the kind of connected world where established businesses that are not evolving digitally are in jeopardy of becoming extinct. New research shows companies are preparing to make digital transformation a priority in the near future. However most of them have a long way to go before achieving any kind of mastery over the multiple disciples required to effectively innovate ...

April 17, 2018

IT Transformation can result in bottom-line benefits that drive business differentiation, innovation and growth, according to new research conducted by Enterprise Strategy Group (ESG) ...

April 16, 2018

While regulatory compliance is an important activity for medium to large businesses, easy and cost-effective solutions can be difficult to find. Network visibility is an often overlooked, but critically important, activity that can help lower costs and make life easier for IT personnel that are responsible for these regulatory compliance solutions ...

April 12, 2018

This is the third in a series of three blogs directed at recent EMA research on the digital war room. In this blog, we'll look at three areas that have emerged in a spotlight in and of themselves — as signs of changing times — let alone as they may impact digital war room decision making. They are the growing focus on development and agile/DevOps; the impacts of cloud; and the growing need for security and operations (SecOps) to team more effectively ...

April 11, 2018

As we've seen, hardware is at the root of a large proportion of data center outages, and the costs and consequences are often exacerbated when VMs are affected. The best answer, therefore, is for IT pros to get back to basics ...