Take the War Out of the War Room
June 09, 2015

Nik Koutsoukos
Catchpoint

Share this

The development of new and more complex business technologies happens so quickly now that they are starting to outpace the rate at which IT organizations can effectively monitor the entire IT infrastructure and react to problems. This is particularly true as more enterprises adopt a hybrid model with some resources managed in the data center and some in cloud or SaaS-based environments. Simultaneously, IT organizations have become increasingly siloed as different personnel develop skillsets specific to different pieces of the IT infrastructure, such as database management, the network, information security, etc.

As a result, the “war room” – where IT personnel gather to diagnose and fix a problem – more often than not devolves into a session of finger pointing and delays. Remedying this situation demands a new approach to managing performance that enables IT to become more proactive instead of reactive, and more collaborative instead of siloed.

Riverbed recently held a webinar on this topic, and one of our presenters was Forrester Vice President and Principal Analyst Jean-Pierre Garbani. He opened his remarks with a statement that nicely summarizes how predictive analytics technologies have radically reshaped how any company does (or should do) business: “Every company board, IT organization and leadership team should assume that there are – or will be – new ways to more efficiently service customers.”

In other words, counting on the luxury of being able to time the development and release of new products, applications or services to slow-moving market trends is a thing of the past. Just ask the taxicab industry. After more than a century of enjoying a monopoly, it suddenly finds itself in a battle for its life against data-driven services like Uber and Lyft. Or consider the examples of Kodak, Blockbuster, Tower Records or Borders for evidence of how quickly a long-established business model can become obsolete very quickly.

Today companies can collect massive amounts of data and use predictive analytics technologies to determine and use invaluable information such as customer buying trends, supply chain capacity, commodity price futures, or to provide customers with data-driven offers. Enterprises are pouring money and energy into creating innovative applications and getting them to market faster, better and cheaper. Agile and DevOps capabilities can reduce release cycles from months to mere days, and the funding for these investments typically comes by spending reductions in infrastructure.

These complexities can quickly overwhelm human abilities and makes the job of resolving problems and maintaining systems increasingly difficult and time-consuming. That impacts service quality. Forrester has conducted a number of surveys and found that 56 percent of IT organizations resolve less than 75 percent of application performance problems in 24 hours, and in some cases, those performance issues can lag for months before resolution. Consider as examples outages that affect services like Gmail or Dropbox.

The root of the problem lies with the fact that IT grew up around domains such as the network, systems, applications, databases, etc., and they needed domain data to do their jobs. That has driven a proliferation of domain-centric point tools, which helps each domain group, but also means that for even very simple transactions, domain teams only see part of the transaction, such as packet data or metrics from an app server. This incomplete visibility means domain teams see different things due to inconsistent data sets and differing analytic approaches. That leads to a lack of collaboration, warring tribes, and ultimately conflicting conclusions that inhibit fast time to resolution.

For example, last year Adobe’s move to cloud-based software back fired momentarily when database maintenance resulted in application availability issues. The company’s Creative Cloud service was unavailable for about a day, leaving users unable to access the web versions of apps such as Photoshop and Premiere. In total, the outage was said to have impacted at least a million subscribers. Other Adobe-related products were impacted during the downtime as well, including Adobe's Business Catalyst analytics tool. The company has since implemented procedures to prevent a similar outage from happening again.

This instance highlights the area where companies typically struggle to solve performance issues. Once a problem occurs, it usually doesn’t take long for a frustrated employee or customer to raise it with IT, and once the specific cause is identified, fixing and validating that fix should not take long. Where the delays occur is in the middle of that timeline: the diagnosis, or what Forrester refers to as the “Mean Time to Know” (MTTK).

Because an IT organization is typically divided into independent silos that have little interaction with each other, the diagnosis process cannot be a collaborative effort. The war room where personnel gather to battle the problem becomes a war against each other. Instead of one collaborative effort, each silo uses its own specialized tools to evaluate the issue, and can typically only determine the fault lies with another group, but does not know which one. So the problem gets passed from group to group, a tedious and time-wasting exercise.

We will always have different, specialized groups within one IT organization to oversee services and applications such as end-user experiences, application monitoring, database monitoring, transaction mapping and infrastructure monitoring. What must change is the elimination of the individual dashboards each group uses to monitor its own domains. The key is to roll all of that reporting information in real-time into one global dashboard that provides broad domain monitoring capabilities that can be abstracted and analyzed in a way that focuses on services and transactions. Providing this single source of truth will reconcile technology silos and support better incident and problem management processes.

In other words, you take the war out of the war room. Each participant can find the right information needed to perform his or her tasks while also sharing that information with their peers so they can do the same.

Implementing this new approach to performance management will be a radical change for many organizations, and there may be initial resistance to overcome as groups worry their individual roles are at risk of marginalization. Again, the ultimate goal is not to eliminate specialized groups within one IT organization, it is to improve the collaboration among those groups. The result is performance management that is much less reactive and must wait for a problem to occur before taking action. Universal real-time monitoring can enable IT to anticipate when and where a problem may arise and fix it before the end user or customer even notices it. The most productive end user and happiest customer can often be the ones you never hear from because their experiences are always positive. That kind of silence is golden.

Nik Koutsoukos is CMO, Strategy & Product Leader, at Catchpoint
Share this

The Latest

October 14, 2021

Businesses are embracing artificial intelligence (AI) technologies to improve network performance and security, according to a new State of AIOps Study, conducted by ZK Research and Masergy ...

October 13, 2021

What may have appeared to be a stopgap solution in the spring of 2020 is now clearly our new workplace reality: It's impossible to walk back so many of the developments in workflow we've seen since then. The question is no longer when we'll all get back to the office, but how the companies that are lagging in their technological ability to facilitate remote work can catch up ...

October 12, 2021

The pandemic accelerated organizations' journey to the cloud to enable agile, on-demand, flexible access to resources, helping them align with a digital business's dynamic needs. We heard from many of our customers at the start of lockdown last year, saying they had to shift to a remote work environment, seemingly overnight, and this effort was heavily cloud-reliant. However, blindly forging ahead can backfire ...

October 07, 2021

SmartBear recently released the results of its 2021 State of Software Quality | Testing survey. I doubt you'll be surprised to hear that a "lack of time" was reported as the number one challenge to doing more testing, especially as release frequencies continue to increase. However, it was disheartening to see that a lack of time was also the number one response when we asked people to identify the biggest blocker to professional development ...

October 06, 2021

The role of the CIO is evolving with an increased focus on unlocking customer connections through service innovation, according to the 2021 Global CIO Survey. The study reveals the shift in the role of the CIO with the majority of CIO respondents stating innovation, operational efficiency, and customer experience as their top priorities ...

October 05, 2021

The perception of IT support has dramatically improved thanks to the successful response of service desks to the pandemic, lockdowns and working from home, according to new research from the Service Desk Institute (SDI), sponsored by Sunrise Software ...

October 04, 2021

Is your company trying to use artificial intelligence (AI) for business purposes like sales and marketing, finance or customer experience? If not, why not? If so, has it struggled to start AI projects and get them to work effectively? ...

September 30, 2021

As remote work persists, and organizations take advantage of hire-from-anywhere models — in addition to facing other challenges like extreme weather events — companies across industries are continuing to re-evaluate the effectiveness of their tech stack. Today's increasingly distributed workforce has put a much greater emphasis on network availability across more endpoints as well as increased the bandwidth required for voice and video. For many, this has posed the question of whether to switch to a new network monitoring system ...

September 29, 2021

When a website or app fails or falters, the standard operating procedure is to assemble a sizable team to quickly "divide and conquer" to find a solution. The details of the problem can usually be found somewhere among millions of log events and metrics, leading to slow and painstaking searches that can take hours and often involve handoffs between experts in different areas of the software. The immediate goal in these situations is not to be comprehensive, but rather to troubleshoot until you find a solution that remedies the symptom, even if the underlying root cause is not addressed ...

September 28, 2021

Evaluating which products and vendors can meet today's modern and complex IT business requirements is a challenge. To help, I'd like to explore 10 key questions every IT admin should be asking when evaluating or working with network performance tools ...