The App-Hugger's Brief History of Application Recovery - Part I: Pre-APM
May 16, 2014

Kevin McCartney

Share this

Here is a brief summary of the most common approaches to application recovery since the mid-1990s, along with an overview of the limitations we’ve run across most frequently.

METHOD: Scripting

DATES: 1995 – Present

ALSO KNOWN AS: “Manual Labor”


• Users identify problems and alert IT

• IT focused on infrastructure, not apps – at this time, there was a direct correlation between server and app, as all apps ran on dedicated HW (prior to virtualization and cloud), which no longer exists

• Difficult to pinpoint problems

• Heavy reliance on scripts--requires maintenance of script library

METHOD: Runbooks

DATES: 2001 – Present

ALSO KNOWN AS: “The Manual Process of Manuals”


• Shelves of binders: if this, then that

• IT still focused on infrastructure, not apps

• Still difficult to identify source of problems

• Recovery very labor intensive

METHOD: Runbook Automation

DATES: 2007 – Present

ALSO KNOWN AS: “Rise of the Machines”


• Emergence of software platforms that can execute scripts

• Works for routine operations such as provisioning

• Still requires a manual decision on what to do (which runbook to execute) – as it lacks awareness of overall health or current state of an application


IT organizations manage run-time applications largely through an infrastructure-centric approach (network, server monitoring), which is then used to derive application health. The challenge with the approach is that it is not application-aware, and cannot tell you anything of the critical applications running on top of them. In some cases, application level monitoring is implemented, which provides analytics about an application’s performance. However, without the ability to intelligently respond, or empower staff to do so, these analytics will have limited benefit to ensuring the uptime of applications in their run-time environment.These tools tend provide a historical or root cause analysis view, versus a responsive solution to addressing real-time issues.

In conjunction with this approach, IT organizations may couple monitoring with script-based tools , including (also known as Run Books,) to help improve the efficiency of routine and pre-defined tasks. Scripts and run books can be effective to automate basic tasks with a known “start” and “stop”, however, they are not well-suited, nor are they scalable for complex, run-time environments. This is due to the fact that to address run-time Application Management with this approach, it requires scripts to be written for every possible scenario, and every possible combination of scenarios that may occur for each application – and they must be continually updated and adapted as the environment grows.

Furthermore, this typically still requires manual decision-making. And if scripts are not run properly, based on the state, and in context of each application’s hierarchy and dependencies, they provide limited utility – and in cases may actually compound the application downtime and data corruption problems they sought to prevent.

The App Hugger's Brief History of Application Recovery - Part II: The APM Era

Kevin McCartney is CEO of Jumpsoft
Share this

The Latest

May 25, 2023

Developers need a tool that can be portable and vendor agnostic, given the advent of microservices. It may be clear an issue is occurring; what may not be clear is if it's part of a distributed system or the app itself. Enter OpenTelemetry, commonly referred to as OTel, an open-source framework that provides a standardized way of collecting and exporting telemetry data (logs, metrics, and traces) from cloud-native software ...

May 24, 2023

As SLOs grow in popularity their usage is becoming more mature. For example, 82% of respondents intend to increase their use of SLOs, and 96% have mapped SLOs directly to their business operations or already have a plan to, according to The State of Service Level Objectives 2023 from Nobl9 ...

May 23, 2023

Observability has matured beyond its early adopter position and is now foundational for modern enterprises to achieve full visibility into today's complex technology environments, according to The State of Observability 2023, a report released by Splunk in collaboration with Enterprise Strategy Group ...

May 22, 2023

Before network engineers even begin the automation process, they tend to start with preconceived notions that oftentimes, if acted upon, can hinder the process. To prevent that from happening, it's important to identify and dispel a few common misconceptions currently out there and how networking teams can overcome them. So, let's address the three most common network automation myths ...

May 18, 2023

Many IT organizations apply AI/ML and AIOps technology across domains, correlating insights from the various layers of IT infrastructure and operations. However, Enterprise Management Associates (EMA) has observed significant interest in applying these AI technologies narrowly to network management, according to a new research report, titled AI-Driven Networks: Leveling Up Network Management with AI/ML and AIOps ...

May 17, 2023

When it comes to system outages, AIOps solutions with the right foundation can help reduce the blame game so the right teams can spend valuable time restoring the impacted services rather than improving their MTTI score (mean time to innocence). In fact, much of today's innovation around ChatGPT-style algorithms can be used to significantly improve the triage process and user experience ...

May 16, 2023

Gartner identified the top 10 data and analytics (D&A) trends for 2023 that can guide D&A leaders to create new sources of value by anticipating change and transforming extreme uncertainty into new business opportunities ...

May 15, 2023

The only way for companies to stay competitive is to modernize applications, yet there's no denying that bringing apps into the modern era can be challenging ... Let's look at a few ways to modernize applications and consider what new obstacles and opportunities 2023 presents ...

May 11, 2023
Applications can be subjected to high traffic on certain days, which, if not taken into account, can lead to unpredictable outcomes and customer dissatisfaction. These may include slow loading speeds, downtime, and unpredictable outcomes, among others ... Hence, applications must be tested for load thresholds to improve performance. Businesses that ignore load performance testing and fail to continually scale these applications leave themselves open to service outages, customer dissatisfaction, and monetary losses ...
May 10, 2023

As online penetration grows, retailers' profits are shrinking — with the cost of serving customers anytime, anywhere, at any speed not bringing in enough topline growth to best monetize even existing investments in technology, systems, infrastructure, and people, let alone new investments, according to Digital-First Retail: Turning Profit Destruction into Customer and Shareholder Value, a new report from AlixPartners and World Retail Congress ...