Our digital economy is intolerant of downtime. But consumers haven't just come to expect always-on digital apps and services. They also expect continuous innovation, new functionality and lightening fast response times.
Organizations have taken note, investing heavily in teams and tools that supposedly increase uptime and free resources for innovation. But leaders have not realized this "throw money at the problem" approach to monitoring is burning through resources without much improvement in availability outcomes.
The Moogsoft State of Availability Report — which helps engineering teams and leaders uncover insights about availability KPIs, teams and tools — found that businesses are double-investing in monitoring. Organizations spend too much money on too many tools, and teams spend the majority of their days monitoring their monitoring tools.
This over-investment in incident management goes largely unnoticed by management. So does the fact that monitoring cycles siphon resources from the future-driven work that delights customers and keeps engineers engaged.
We identify a few common causes of the spend for less approach here:
1. Sprawling single-domain monitoring tools
In a noble attempt to keep digital apps and services available to end users at all times, business leaders buy tools that monitor their increasingly large and complex IT infrastructures. In theory, these tools should speed fixes to performance-affecting issues by continuously scanning systems and notifying engineers about anomalies.
The problem is: Teams have far too many tools. On average, engineers manage 16 monitoring tools. And that number can creep up to 40 as SLAs increase. Sprawling tools like this are unwieldy and license, management and maintenance overheads are expensive. But the over-investment in monitoring doesn't stop there.
2. Days spend in monitoring cycles
IT monitoring tools should bear the brunt of monitoring itself. In principle, these tools relieve engineers from spending too much time on a fairly tedious task and enable them to deliver what customers want: bigger and better technology.
Unfortunately, teams spend by far the most time monitoring over any other task. Why? Engineers spin their wheels managing single-domain tools that are not integrated cross stack. and produce huge volumes of largely useless data. Teams facing a critical outage or incident waste valuable time investigating data from disparate tools and connecting the dots themselves.
3. Leadership-team misalignment
Business leaders do not see just how much time their teams spend on monitoring, and likely believe they're making sound monitoring investments. Leaders believe their teams spend about the same amount of their time on monitoring as they do on other daily (and often future-driven) responsibilities like automation, cloud transformation and development.
4. Stalling innovation and experimentation
With engineering teams stuck in monitoring cycles, something has to give. And unfortunately, that thing is innovation and experimentation — the very activities that delight customers and engage engineering teams. In other words, not only do organizations over-invest in monitoring, they do so to the detriment of customer experience improvements.
The solution: steps to tech stability
If you are part of an engineering team or team leader, chances are you're facing modern-day monitoring problems. Consider these best practices for breaking wasteful monitoring cycles and building your tech stability:
1. Baseline your tools. Audit your existing tools, understand their utilization and what they cost. Then, you can determine which of these assets advance availability goals and which just create more noise.
2. Consolidate your tools. Hold on to only those monitoring tools that provide value. Otherwise, try to shrink your monitoring tools' footprint to decrease total cost of ownership (TCO) and reduce noise.
3. Implement an artificial intelligence for IT Operations (AIOps) solution. Make your next monitoring investment one that makes engineer's jobs less toilsome, not more. AIOps connects cloud and on-prem monitoring tools, giving engineers a central system of engagement for all monitoring activities. The platform alerts engineers to data anomalies and their root cause and automates the entire incident lifecycle.
4. Pay down your technical debt. With time back on your side, tackle the most relevant tech debt and increase system stability. Free even more time by automating away toil and continue to increase availability with chaos engineering.
5. Invest in the future. With time and money saved, refocus your investments on company-differentiating initiatives.
Monitoring tools are essential to uptime. But monitoring cannot be the only thing teams do — especially when it hinders innovation and experimentation. Leaders must make more informed investments to monitor more effectively. Only then can organizations move from maintaining the customer experience to innovating the customer experience.
You could argue that, until the pandemic, and the resulting shift to hybrid working, delivering flawless customer experiences and improving employee productivity were mutually exclusive activities. Evidence from Catchpoint's recently published Site Reliability Engineering (SRE) industry report suggests this is changing ...
There are many issues that can contribute to developer dissatisfaction on the job — inadequate pay and work-life imbalance, for example. But increasingly there's also a troubling and growing sense of lacking ownership and feeling out of control ... One key way to increase job satisfaction is to ameliorate this sense of ownership and control whenever possible, and approaches to observability offer several ways to do this ...
The need for real-time, reliable data is increasing, and that data is a necessity to remain competitive in today's business landscape. At the same time, observability has become even more critical with the complexity of a hybrid multi-cloud environment. To add to the challenges and complexity, the term "observability" has not been clearly defined ...
Many have assumed that the mainframe is a dying entity, but instead, a mainframe renaissance is underway. Despite this notion, we are ushering in a future of more strategic investments, increased capacity, and leading innovations ...
Most (85%) consumers shop online or via a mobile app, with 59% using these digital channels as their primary holiday shopping channel, according to the Black Friday Consumer Report from Perforce Software. As brands head into a highly profitable time of year, starting with Black Friday and Cyber Monday, it's imperative development teams prepare for peak traffic, optimal channel performance, and seamless user experiences to retain and attract shoppers ...
From staffing issues to ineffective cloud strategies, NetOps teams are looking at how to streamline processes, consolidate tools, and improve network monitoring. What are some best practices that can help achieve this? Let's dive into five ...
On November 1, Taylor Swift announced the Eras Tour ... the whole world is now standing in the same virtual queue, and even the most durable cloud architecture can't handle this level of deluge ...
OpenTelemetry, a collaborative open source observability project, has introduced a new network protocol that addresses the infrastructure management headache, coupled with collector configuration options to filter and reduce data volume ...
A unified view of digital infrastructure is essential for IT teams that must improve the digital user experience while boosting overall organizational productivity, according to a survey of IT managers in the United Arab Emirates (UAE), from Riverbed and market research firm IDC ...
Building the visibility infrastructure to make cloud networks observable is a complex technical challenge. But with careful planning and a few strategic decisions, it's possible to appropriately design, set up and manage visibility solutions for the cloud ...