Don't Be the Next Instapaper
February 21, 2017

Michelle McLean
ScaleArc

Share this

Instapaper, a "read later" tool for saving web pages to read on other devices or offline, suffered an extensive outage 2 weeks ago. The site was unavailable for a day and a half, and even after restoring service, the company had to explain that its archives would be impacted for another full week. Ultimately, it was able to restore the archives sooner, but the outage garnered extensive press and social media coverage.

The cause of the outage was that an indexing file Instapaper relies on for reaching all stored links exceeded the max file size supported on the older instance of Amazon Web Services the site was first built on. You can read if you want more details .

While Instapaper hit a unique problem — a file size limitation — its experience speaks to a much larger problem: scaling a database is difficult, and never quick. That basic fact explains why outages like the one Instapaper suffered are surprisingly common.

Engineering a scaled database — and then performing the application changes needed to take advantage of that scaled out database — is tough coding work indeed. We encounter companies with full control of their source code who are petrified to make the changes needed to scale database capacity. Perhaps it's an ecommerce app, and it's too close to Black Friday. Or maybe it's just a case of attrition: the folks who really understand that code base are long gone, and the current engineers don't dare mess with the interworkings of the app.

These kinds of meltdowns are common during surge events, like the one ESPN suffered with the launch of Fantasy Football or the one Macy's suffered last Black Friday. Sometimes customers can see these events coming (e.g., they're expecting a major traffic surge on Black Friday) and sometimes they simply don't (e.g., their product gets a nod from a celebrity and all of a sudden they're swamped).

When a traffic surge takes down your site, it usually means the data tier was already fragile. Scaling the web infrastructure is pretty easy, as is scaling internet capacity. But scaling the data tier itself is where the challenges lie.

The Instapaper crisis also illustrates how the cloud alone doesn't solve the challenge of scaling the data tier. While elasticity is a hallmark of cloud services, the physics around having an application talk to multiple instances of a database remains a challenge. We've seen some customers suffer from an inflated sense of confidence that running in the cloud takes away these difficulties.

Don't wait for disaster to strike. Whether you're running on prem or in the cloud, keep a close eye on all metrics that reveal how "hot" your systems are running. Ensure your disaster recovery plan is robust — and recently tested. Better yet, don't rely on disaster recovery. Instead, run in active/active mode, where you've got multiple instances of all critical systems running in different locales, with the systems able to take on the full load if one portion fails.

Take steps now to scale your data tier and avoid these kinds of catastrophic outages. Those "Here's why we failed" engineering blog entries are no fun to write.

Michelle McLean is VP of Marketing at ScaleArc.

Share this

The Latest

January 19, 2018

Confidence in satisfying and supporting core IT has diminished due in part to a strain on declining IT budgets and initiatives now progressing beyond implementation into production mode, according to TEKsystems' annual IT Forecast research ...

January 18, 2018

Making predictions is always a gamble. But given the way 2017 played out and the way 2018 is shaping up, odds are that certain technology trends will play a significant role in your IT department this year ...

January 17, 2018

With more than one-third of IT Professionals citing "moving faster" as their top goal for 2018, and an overwhelming 99 percent of IT and business decision makers noticing an increasing pace of change in today's connected world, it's clear that speed has become intrinsically linked to business success. For companies looking to compete in the digital economy, this pace of transformation is being driven by their customers and requires speedy software releases, agility through cloud services, and automation ...

January 16, 2018

Looking back on this year, we can see threads of what the future holds in enterprise networking. Specifically, taking a closer look at the biggest news and trends of this year, IT areas where businesses are investing and perspectives from the analyst community, as well as our own experiences, here are five network predictions for the coming year ...

January 12, 2018

As we enter 2018, businesses are busy anticipating what the new year will bring in terms of industry developments, growing trends, and hidden surprises. In 2017, the increased use of automation within testing teams (where Agile development boosted speed of release), led to QA becoming much more embedded within development teams than would have been the case a few years ago. As a result, proper software testing and monitoring assumes ever greater importance. The natural question is – what next? Here are some of the changes we believe will happen within our industry in 2018 ...

January 11, 2018

Application Performance Monitoring (APM) has become a must-have technology for IT organizations. In today’s era of digital transformation, distributed computing and cloud-native services, APM tools enable IT organizations to measure the real experience of users, trace business transactions to identify slowdowns and deliver the code-level visibility needed for optimizing the performance of applications. 2018 will see the requirements and expectations from APM solutions increase in the following ways ...

January 10, 2018

We don't often enough look back at the prior year’s predictions to see if they actually came to fruition. That is the purpose of this analysis. I have picked out a few key areas in APMdigest's 2017 Application Performance Management Predictions, and analyzed which predictions actually came true ...

January 09, 2018

Planning for a new year often includes predicting what’s going to happen. However, we don't often enough look back at the prior year’s predictions to see if they actually came to fruition. That is the purpose of this analysis. I have picked out a few key areas in APMdigest's 2017 Application Performance Management Predictions, and analyzed which predictions actually came true ...

January 08, 2018

The annual list of DevOps Predictions is now a DEVOPSdigest tradition. DevOps experts — analysts and consultants, users and the top vendors — offer predictions on how DevOps and related technologies will evolve and impact business in 2018 ...

January 05, 2018

Industry experts offer predictions on how Network Performance Management (NPM) and related technologies will evolve and impact business in 2018 ...