The ability to ensure that business services meet customer needs has never been more critical or more challenging. End-users have increasingly higher expectations, as well as more visibility into failure, thanks to social media and technology adoption.
The Data Analysis Challenge
The IT that supports critical business services has grown tremendously in size and complexity as new technology is adopted to meet changing business needs. Many IT organizations are no longer wholly responsible for all the components that business services rely on and employ third-party services and content providers that reside outside their firewall. In fact, a study of critical business services for 3,000 enterprises shows that the average service depends on data from more than ten different hosts.
Additionally, applications are becoming increasingly dynamic. Outsourced components and services might be interchanged as part of the normal course of a day. Our study shows that over the course of 24 hours, 42 percent of transactions will depend on services emanating from at least 6 data centers, all invoked directly from the client or consumption point. In 8 percent of transactions, services will be delivered from 30 different data centers or more.
Managing business services and their infrastructures is more difficult than ever. Processing is distributed, occurring within the data center in physical, virtual and hybrid environments; in shared third-party environments delivering specialized outsourced components; and on the increasingly more powerful end-user clients. Cloud computing, which promises improved IT efficiency and flexibility as well as simplified service provisioning, also increases IT service complexity.
Traditionally, the approach to business service management has been to leverage a discovery process to populate a configuration management database, which is then used to group various IT components by the business services they support. Data from disparate monitoring tools, typically alert data, is then correlated to help understand how those IT systems support the business service.
However, this approach is fundamentally flawed in modern IT environments. These techniques are not designed to address the constant change that occurs across the entire service delivery chain and are less useful in cases of highly shared infrastructure.
In today’s dynamic IT environments, setting thresholds for the various monitoring points in the infrastructure becomes practically impossible. When thresholds are set manually, they will either be too generous to pick up performance issues, or so stringent resulting in a sea of alerts being fired by the monitoring solutions. A new approach is required to ensure that IT can meet constantly changing business needs.
Bringing Metrics and Business Services Together
Most IT environments have more monitoring data than they know what to do with, but few if any of these metrics can report on what really matters - how the core business services are being supported. Ultimately, stakeholders need to have enough relevant information to be able to take action before the business is impacted. The key is identifying irregular patterns and abnormal behavior of the overall business service or its underlying components.
Relevant metrics should be tied to how business success (or failure) is measured. Examples of measureable business outcomes include the number of impacted users, up-to-the-minute revenue, conversion rates, number of orders, and number of page views.
More importantly, these metrics should not be viewed in isolation. They need to be viewed in the context of all of the more technical IT metrics so that ‘leading indicators’ can be identified – internal conditions and combinations of factors that may lead to a later business impact if not corrected.
Understanding performance and usage patterns and establishing a "normal" behavior pattern or profile is essential in detecting subtle anomalies. Predictive analytics provides insight into which conditions in a highly complex IT environment should be considered normal and acceptable and, in contrast, which events and conditions may lead to service level degradation. It is also vital that these metrics be source agnostic – in that they can be collected from existing monitoring tools and leveraged in the context of end user performance.
“What-if” scenarios can help organizations identify areas where IT resources can be used to address abnormal situations or improve the business service. Predictive analytics capabilities can be made even more powerful by leveraging the aggregate performance data of an entire customer base. This insight, which we call “Collective Intelligence,” can feed real-time health and performance data to a supplier catalog.
This information allows an organization to look beyond its walls by gauging the overall performance of a third-party supplier that it shares with other customers and quickly identify whether the fault lies with the supplier.
These capabilities can be further extended to perform ‘what-if’ scenarios such as:
What if I change my supplier mix?
What if I move IT services to the cloud?
What if I get an unexpected surge in traffic?
Organizations can leverage analytics as well as a supplier catalog to make intelligent decisions on how to optimize the entire application delivery chain. This can include changes to components that are under the enterprise’s control (e.g. improving resources on a particular VM), as well as leveraging the supplier catalog and price/performance comparisons to ensure an optimal solution. For example, the mix of content delivery networks could be adjusted based on factors such as geographic location, traffic volumes, performance and cost of the service.
If organizations truly want to support key business processes with IT services, they need to first understand how these systems support business needs and then optimize the entire service delivery chain to support these business outcomes. An approach that starts with business outcomes and works back to correlate how all the IT metrics relate to meeting that outcome will bring success. It is also no longer good enough to be fast at fixing problems – it is now vital to be able to prevent them as well.
About Imad Mouline
Imad Mouline is Chief Technology Officer (CTO) of Compuware's APM Solution. He is a veteran of software architecture and R&D and a recognized expert in web application architecture, development and performance management. His areas of expertise include Cloud Computing, Software-as-a-Service, and mobile applications. As Compuware's CTO of APM, Mouline leads the expansion of the company's product portfolio and market presence. Imad is a frequent speaker at various user conferences and technology events (e.g., Velocity, All About the Cloud, Interop Las Vegas and Think Tank). He has also participated in executive conferences such as the InfoWorld CTO Forum and serves on the advisory board for the Cloud Connect conference.
I've had the opportunity to work with a number of organizations embarking on their AIOps journey. I always advise them to start by evaluating their needs and the possibilities AIOps can bring to them through five different levels of AIOps maturity. This is a strategic approach that allows enterprises to achieve complete automation for long-term success ...
Sumo Logic recently commissioned an independent market research study to understand the industry momentum behind continuous intelligence — and the necessity for digital organizations to embrace a cloud-native, real-time continuous intelligence platform to support the speed and agility of business for faster decision-making, optimizing security, driving new innovation and delivering world-class customer experiences. Some of the key findings include ...
When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many ...
Teams tasked with developing and delivering software are under pressure to balance the business imperative for speed with high customer expectations for quality. In the course of trying to achieve this balance, engineering organizations rely on a variety of tools, techniques and processes. The 2020 State of Software Quality report provides a snapshot of the key challenges organizations encounter when it comes to delivering quality software at speed, as well as how they are approaching these hurdles. This blog introduces its key findings ...
For IT teams, run-the-business, commodity areas such as employee help desks, device support and communication platforms are regularly placed in the crosshairs for cost takeout, but these areas are also highly visible to employees. Organizations can improve employee satisfaction and business performance by building unified functions that are measured by employee experience rather than price. This approach will ultimately fund transformation, as well as increase productivity and innovation ...
In the agile DevOps framework, there is a vital piece missing; something that previous approaches to application development did well, but has since fallen by the wayside. That is, the post-delivery portion of the toolchain. Without continuous cloud optimization, the CI/CD toolchain still produces massive inefficiencies and overspend ...
The COVID-19 pandemic has exponentially accelerated digital transformation projects. To better understand where IT professionals are turning for help, we analyzed the online behaviors of IT decision-makers. Our research found an increase in demand for resources related to APM, microservices and dependence on cloud services ...
The rush to the public cloud has now slowed as organizations realized that it is not a "one size fits all" solution. The main issue is the lack of deep visibility into the performance of applications provided by the host. Our own research has recently revealed that 32% of public cloud resources are currently under-utilized, and without proper direction and guidance, this will remain the case ...
The global shift to working from home (WFH) enforced by COVID-19 stay-at-home orders has had a massive impact on everyone's working lives, not just in the way they remotely interact with their teams and IT systems, but also in how they spend their working days. With both governments and businesses committed to slowly opening up offices, it's increasingly clear that a high prevalence of remote work will continue throughout 2020 and beyond. This situation begets important questions ...