Take the War Out of the War Room
June 09, 2015

Nik Koutsoukos
Catchpoint

Share this

The development of new and more complex business technologies happens so quickly now that they are starting to outpace the rate at which IT organizations can effectively monitor the entire IT infrastructure and react to problems. This is particularly true as more enterprises adopt a hybrid model with some resources managed in the data center and some in cloud or SaaS-based environments. Simultaneously, IT organizations have become increasingly siloed as different personnel develop skillsets specific to different pieces of the IT infrastructure, such as database management, the network, information security, etc.

As a result, the “war room” – where IT personnel gather to diagnose and fix a problem – more often than not devolves into a session of finger pointing and delays. Remedying this situation demands a new approach to managing performance that enables IT to become more proactive instead of reactive, and more collaborative instead of siloed.

Riverbed recently held a webinar on this topic, and one of our presenters was Forrester Vice President and Principal Analyst Jean-Pierre Garbani. He opened his remarks with a statement that nicely summarizes how predictive analytics technologies have radically reshaped how any company does (or should do) business: “Every company board, IT organization and leadership team should assume that there are – or will be – new ways to more efficiently service customers.”

In other words, counting on the luxury of being able to time the development and release of new products, applications or services to slow-moving market trends is a thing of the past. Just ask the taxicab industry. After more than a century of enjoying a monopoly, it suddenly finds itself in a battle for its life against data-driven services like Uber and Lyft. Or consider the examples of Kodak, Blockbuster, Tower Records or Borders for evidence of how quickly a long-established business model can become obsolete very quickly.

Today companies can collect massive amounts of data and use predictive analytics technologies to determine and use invaluable information such as customer buying trends, supply chain capacity, commodity price futures, or to provide customers with data-driven offers. Enterprises are pouring money and energy into creating innovative applications and getting them to market faster, better and cheaper. Agile and DevOps capabilities can reduce release cycles from months to mere days, and the funding for these investments typically comes by spending reductions in infrastructure.

These complexities can quickly overwhelm human abilities and makes the job of resolving problems and maintaining systems increasingly difficult and time-consuming. That impacts service quality. Forrester has conducted a number of surveys and found that 56 percent of IT organizations resolve less than 75 percent of application performance problems in 24 hours, and in some cases, those performance issues can lag for months before resolution. Consider as examples outages that affect services like Gmail or Dropbox.

The root of the problem lies with the fact that IT grew up around domains such as the network, systems, applications, databases, etc., and they needed domain data to do their jobs. That has driven a proliferation of domain-centric point tools, which helps each domain group, but also means that for even very simple transactions, domain teams only see part of the transaction, such as packet data or metrics from an app server. This incomplete visibility means domain teams see different things due to inconsistent data sets and differing analytic approaches. That leads to a lack of collaboration, warring tribes, and ultimately conflicting conclusions that inhibit fast time to resolution.

For example, last year Adobe’s move to cloud-based software back fired momentarily when database maintenance resulted in application availability issues. The company’s Creative Cloud service was unavailable for about a day, leaving users unable to access the web versions of apps such as Photoshop and Premiere. In total, the outage was said to have impacted at least a million subscribers. Other Adobe-related products were impacted during the downtime as well, including Adobe's Business Catalyst analytics tool. The company has since implemented procedures to prevent a similar outage from happening again.

This instance highlights the area where companies typically struggle to solve performance issues. Once a problem occurs, it usually doesn’t take long for a frustrated employee or customer to raise it with IT, and once the specific cause is identified, fixing and validating that fix should not take long. Where the delays occur is in the middle of that timeline: the diagnosis, or what Forrester refers to as the “Mean Time to Know” (MTTK).

Because an IT organization is typically divided into independent silos that have little interaction with each other, the diagnosis process cannot be a collaborative effort. The war room where personnel gather to battle the problem becomes a war against each other. Instead of one collaborative effort, each silo uses its own specialized tools to evaluate the issue, and can typically only determine the fault lies with another group, but does not know which one. So the problem gets passed from group to group, a tedious and time-wasting exercise.

We will always have different, specialized groups within one IT organization to oversee services and applications such as end-user experiences, application monitoring, database monitoring, transaction mapping and infrastructure monitoring. What must change is the elimination of the individual dashboards each group uses to monitor its own domains. The key is to roll all of that reporting information in real-time into one global dashboard that provides broad domain monitoring capabilities that can be abstracted and analyzed in a way that focuses on services and transactions. Providing this single source of truth will reconcile technology silos and support better incident and problem management processes.

In other words, you take the war out of the war room. Each participant can find the right information needed to perform his or her tasks while also sharing that information with their peers so they can do the same.

Implementing this new approach to performance management will be a radical change for many organizations, and there may be initial resistance to overcome as groups worry their individual roles are at risk of marginalization. Again, the ultimate goal is not to eliminate specialized groups within one IT organization, it is to improve the collaboration among those groups. The result is performance management that is much less reactive and must wait for a problem to occur before taking action. Universal real-time monitoring can enable IT to anticipate when and where a problem may arise and fix it before the end user or customer even notices it. The most productive end user and happiest customer can often be the ones you never hear from because their experiences are always positive. That kind of silence is golden.

Nik Koutsoukos is CMO, Strategy & Product Leader, at Catchpoint
Share this

The Latest

July 25, 2024

The 2024 State of the Data Center Report from CoreSite shows that although C-suite confidence in the economy remains high, a VUCA (volatile, uncertain, complex, ambiguous) environment has many business leaders proceeding with caution when it comes to their IT and data ecosystems, with an emphasis on cost control and predictability, flexibility and risk management ...

July 24, 2024

In June, New Relic published the State of Observability for Energy and Utilities Report to share insights, analysis, and data on the impact of full-stack observability software in energy and utilities organizations' service capabilities. Here are eight key takeaways from the report ...

July 23, 2024

The rapid rise of generative AI (GenAI) has caught everyone's attention, leaving many to wonder if the technology's impact will live up to the immense hype. A recent survey by Alteryx provides valuable insights into the current state of GenAI adoption, revealing a shift from inflated expectations to tangible value realization across enterprises ... Here are five key takeaways that underscore GenAI's progression from hype to real-world impact ...

July 22, 2024
A defective software update caused what some experts are calling the largest IT outage in history on Friday, July 19. The impact reverberated through multiple industries around the world ...
July 18, 2024

As software development grows more intricate, the challenge for observability engineers tasked with ensuring optimal system performance becomes more daunting. Current methodologies are struggling to keep pace, with the annual Observability Pulse surveys indicating a rise in Mean Time to Remediation (MTTR). According to this survey, only a small fraction of organizations, around 10%, achieve full observability today. Generative AI, however, promises to significantly move the needle ...

July 17, 2024

While nearly all data leaders surveyed are building generative AI applications, most don't believe their data estate is actually prepared to support them, according to the State of Reliable AI report from Monte Carlo Data ...

July 16, 2024

Enterprises are putting a lot of effort into improving the digital employee experience (DEX), which has become essential to both improving organizational performance and attracting and retaining talented workers. But to date, most efforts to deliver outstanding DEX have focused on people working with laptops, PCs, or thin clients. Employees on the frontlines, using mobile devices to handle logistics ... have been largely overlooked ...

July 15, 2024

The average customer-facing incident takes nearly three hours to resolve (175 minutes) while the estimated cost of downtime is $4,537 per minute, meaning each incident can cost nearly $794,000, according to new research from PagerDuty ...

July 12, 2024

In MEAN TIME TO INSIGHT Episode 8, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses AutoCon with the conference founders Scott Robohn and Chris Grundemann ...

July 11, 2024

Numerous vendors and service providers have recently embraced the NaaS concept, yet there is still no industry consensus on its definition or the types of networks it involves. Furthermore, providers have varied in how they define the NaaS service delivery model. I conducted research for a new report, Network as a Service: Understanding the Cloud Consumption Model in Networking, to refine the concept of NaaS and reduce buyer confusion over what it is and how it can offer value ...