Monitoring as a Differentiator: Breaking Silos and Building Understanding
March 27, 2017

David Drai
Anodot

Share this

Monitoring a business means monitoring an entire business – not just IT or application performance. If businesses truly care about differentiating themselves from the competition, they must approach monitoring holistically. Separate, siloed monitoring systems are quickly becoming a thing of the past.

I see time and again cloud monitoring companies working with a myopic focus on the Infrastructure area – a critical mistake. They concentrate on system health but avoid business health like the plague. Although CPU, Disk, Memory and other infrastructure KPIs are essential to maintain a healthy system, their coverage is limited and lacking an equally crucial component that drives how well a company is operating – its business. Today there is simply no excuse for having incomplete monitoring capabilities, and it is more necessary than ever to get out of monitoring siloes.

Cloud Monitoring 1.0 and the Evolution of Metrics

Monitoring infrastructure provides some visibility to overall system health by keeping machines up and running – but it is not at all adequate to determine what is occurring on the business side of a company. Infrastructure monitoring is also far too basic to keep up with updates within applications – essentially putting blinders on a company's leadership.

As it stands, infrastructure monitoring tools usually run in conjunction with other internal tools to gain an angle on the business, or analysts rely on Business Intelligence solutions that may be connected to infrastructure monitoring through internal scripts. In most cases, these 1.0 level tools require a great deal of internal development and maintenance which are difficult to scale.

In the past few years, time series metrics have been the main driver of growth in cloud monitoring systems. This approach of normalizing almost all data per a single time series representation has enabled the provision of generic solutions for many cases and different customers. Because of its rudimentary ability, it is not surprising that open source solutions are becoming so widespread among the businesses which are beginning to understand the importance of monitoring. The ability to represent all metrics in the same manner using the same dashboards and time series function sets has significantly simplified this monitoring method providing good but not fully comprehensive information.

Today's Challenges of Monitoring Business

One of the main challenges of monitoring business KPIs is that static rules and alerts are too limiting. Particularly for metrics that change per trends or seasons, static alerts are difficult to maintain because of their inherent variability. Even in the simplest cases, it is very difficult to define thresholds for thousands of metrics because it requires the user to have working knowledge of their normal range. For e-commerce companies, the holiday season is always a peak time in sales and every metric is going to behave "abnormally." It is nearly impossible for large data-driven companies, which are monitoring so much, to start making changes to reset the threshold for every single metric – talk about a nightmare.

Another challenge of monitoring so many metrics is defining rules manually especially when each metric has a different normal range. Unfortunately, it is essential that this be done to achieve effective configuration. Amazon needs to know that "Elf on a Shelf" dolls are going to sell heavily in November and that gift certificates will be sold later in the month.

Cloud Monitoring 2.0: for IT, applications AND BUSINESS

The newest generation of monitoring centralizes all company activity into a single unified solution, rather than separate solutions for IT, application, and business. This is the holistic understanding that companies have been working towards for so long – the ability to understand every metric separately and together. It is one thing to see an infrastructure anomaly on its own, but to be able to contextualize it with the correlated impact on the business affords an entirely new way to problem-solve and measure the health of a company. Beyond addressing the immediate issues this type of top-down monitoring approach offers tremendous value.

Without a smart mechanism to monitor so many rules and alerts, companies are bound to compromise what they monitor, sacrificing all for a few selected metrics. Analysts are not fortune tellers – there is no way to define what the best metrics are to monitor. This creates an inevitable delay in detection of issues, which severely limits how proactive a company can be in the varied business scenarios it faces. It also limits the granularity of the organization's visibility – bringing us back to where we were with Cloud Monitoring 1.0.

Only recently the implementation of AI in BI is enabling companies to solve challenges in monitoring. By automating the ability to differentiate between what is normal and abnormal behavior (no matter the trend or time of year) businesses finally have a chance to review a comprehensive and automatic evaluation of anomalies. With the addition of AI to monitoring, companies can differentiate themselves by how quickly they respond to changing conditions; how quickly they find bugs and glitches, how rapidly they respond to customers in crisis, and how swiftly they leverage a business opportunity triggered by a celebrity's viral Instagram post.

While companies engage with their customers in more ways than ever before, finding ways to break out of monitoring silos is going to be the key that companies use to successfully scale and compete with industry giants.

David Drai is CEO and Co-Founder of Anodot.

Share this

The Latest

July 25, 2024

The 2024 State of the Data Center Report from CoreSite shows that although C-suite confidence in the economy remains high, a VUCA (volatile, uncertain, complex, ambiguous) environment has many business leaders proceeding with caution when it comes to their IT and data ecosystems, with an emphasis on cost control and predictability, flexibility and risk management ...

July 24, 2024

In June, New Relic published the State of Observability for Energy and Utilities Report to share insights, analysis, and data on the impact of full-stack observability software in energy and utilities organizations' service capabilities. Here are eight key takeaways from the report ...

July 23, 2024

The rapid rise of generative AI (GenAI) has caught everyone's attention, leaving many to wonder if the technology's impact will live up to the immense hype. A recent survey by Alteryx provides valuable insights into the current state of GenAI adoption, revealing a shift from inflated expectations to tangible value realization across enterprises ... Here are five key takeaways that underscore GenAI's progression from hype to real-world impact ...

July 22, 2024
A defective software update caused what some experts are calling the largest IT outage in history on Friday, July 19. The impact reverberated through multiple industries around the world ...
July 18, 2024

As software development grows more intricate, the challenge for observability engineers tasked with ensuring optimal system performance becomes more daunting. Current methodologies are struggling to keep pace, with the annual Observability Pulse surveys indicating a rise in Mean Time to Remediation (MTTR). According to this survey, only a small fraction of organizations, around 10%, achieve full observability today. Generative AI, however, promises to significantly move the needle ...

July 17, 2024

While nearly all data leaders surveyed are building generative AI applications, most don't believe their data estate is actually prepared to support them, according to the State of Reliable AI report from Monte Carlo Data ...

July 16, 2024

Enterprises are putting a lot of effort into improving the digital employee experience (DEX), which has become essential to both improving organizational performance and attracting and retaining talented workers. But to date, most efforts to deliver outstanding DEX have focused on people working with laptops, PCs, or thin clients. Employees on the frontlines, using mobile devices to handle logistics ... have been largely overlooked ...

July 15, 2024

The average customer-facing incident takes nearly three hours to resolve (175 minutes) while the estimated cost of downtime is $4,537 per minute, meaning each incident can cost nearly $794,000, according to new research from PagerDuty ...

July 12, 2024

In MEAN TIME TO INSIGHT Episode 8, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses AutoCon with the conference founders Scott Robohn and Chris Grundemann ...

July 11, 2024

Numerous vendors and service providers have recently embraced the NaaS concept, yet there is still no industry consensus on its definition or the types of networks it involves. Furthermore, providers have varied in how they define the NaaS service delivery model. I conducted research for a new report, Network as a Service: Understanding the Cloud Consumption Model in Networking, to refine the concept of NaaS and reduce buyer confusion over what it is and how it can offer value ...