The ability to ensure that business services meet customer needs has never been more critical or more challenging. End-users have increasingly higher expectations, as well as more visibility into failure, thanks to social media and technology adoption.
The Data Analysis Challenge
The IT that supports critical business services has grown tremendously in size and complexity as new technology is adopted to meet changing business needs. Many IT organizations are no longer wholly responsible for all the components that business services rely on and employ third-party services and content providers that reside outside their firewall. In fact, a study of critical business services for 3,000 enterprises shows that the average service depends on data from more than ten different hosts.
Additionally, applications are becoming increasingly dynamic. Outsourced components and services might be interchanged as part of the normal course of a day. Our study shows that over the course of 24 hours, 42 percent of transactions will depend on services emanating from at least 6 data centers, all invoked directly from the client or consumption point. In 8 percent of transactions, services will be delivered from 30 different data centers or more.
Managing business services and their infrastructures is more difficult than ever. Processing is distributed, occurring within the data center in physical, virtual and hybrid environments; in shared third-party environments delivering specialized outsourced components; and on the increasingly more powerful end-user clients. Cloud computing, which promises improved IT efficiency and flexibility as well as simplified service provisioning, also increases IT service complexity.
Traditionally, the approach to business service management has been to leverage a discovery process to populate a configuration management database, which is then used to group various IT components by the business services they support. Data from disparate monitoring tools, typically alert data, is then correlated to help understand how those IT systems support the business service.
However, this approach is fundamentally flawed in modern IT environments. These techniques are not designed to address the constant change that occurs across the entire service delivery chain and are less useful in cases of highly shared infrastructure.
In today’s dynamic IT environments, setting thresholds for the various monitoring points in the infrastructure becomes practically impossible. When thresholds are set manually, they will either be too generous to pick up performance issues, or so stringent resulting in a sea of alerts being fired by the monitoring solutions. A new approach is required to ensure that IT can meet constantly changing business needs.
Bringing Metrics and Business Services Together
Most IT environments have more monitoring data than they know what to do with, but few if any of these metrics can report on what really matters - how the core business services are being supported. Ultimately, stakeholders need to have enough relevant information to be able to take action before the business is impacted. The key is identifying irregular patterns and abnormal behavior of the overall business service or its underlying components.
Relevant metrics should be tied to how business success (or failure) is measured. Examples of measureable business outcomes include the number of impacted users, up-to-the-minute revenue, conversion rates, number of orders, and number of page views.
More importantly, these metrics should not be viewed in isolation. They need to be viewed in the context of all of the more technical IT metrics so that ‘leading indicators’ can be identified – internal conditions and combinations of factors that may lead to a later business impact if not corrected.
Understanding performance and usage patterns and establishing a "normal" behavior pattern or profile is essential in detecting subtle anomalies. Predictive analytics provides insight into which conditions in a highly complex IT environment should be considered normal and acceptable and, in contrast, which events and conditions may lead to service level degradation. It is also vital that these metrics be source agnostic – in that they can be collected from existing monitoring tools and leveraged in the context of end user performance.
“What-if” scenarios can help organizations identify areas where IT resources can be used to address abnormal situations or improve the business service. Predictive analytics capabilities can be made even more powerful by leveraging the aggregate performance data of an entire customer base. This insight, which we call “Collective Intelligence,” can feed real-time health and performance data to a supplier catalog.
This information allows an organization to look beyond its walls by gauging the overall performance of a third-party supplier that it shares with other customers and quickly identify whether the fault lies with the supplier.
These capabilities can be further extended to perform ‘what-if’ scenarios such as:
What if I change my supplier mix?
What if I move IT services to the cloud?
What if I get an unexpected surge in traffic?
Organizations can leverage analytics as well as a supplier catalog to make intelligent decisions on how to optimize the entire application delivery chain. This can include changes to components that are under the enterprise’s control (e.g. improving resources on a particular VM), as well as leveraging the supplier catalog and price/performance comparisons to ensure an optimal solution. For example, the mix of content delivery networks could be adjusted based on factors such as geographic location, traffic volumes, performance and cost of the service.
If organizations truly want to support key business processes with IT services, they need to first understand how these systems support business needs and then optimize the entire service delivery chain to support these business outcomes. An approach that starts with business outcomes and works back to correlate how all the IT metrics relate to meeting that outcome will bring success. It is also no longer good enough to be fast at fixing problems – it is now vital to be able to prevent them as well.
About Imad Mouline
Imad Mouline is Chief Technology Officer (CTO) of Compuware's APM Solution. He is a veteran of software architecture and R&D and a recognized expert in web application architecture, development and performance management. His areas of expertise include Cloud Computing, Software-as-a-Service, and mobile applications. As Compuware's CTO of APM, Mouline leads the expansion of the company's product portfolio and market presence. Imad is a frequent speaker at various user conferences and technology events (e.g., Velocity, All About the Cloud, Interop Las Vegas and Think Tank). He has also participated in executive conferences such as the InfoWorld CTO Forum and serves on the advisory board for the Cloud Connect conference.
The requirements of an APM tool are now much more complex than they've ever been. Not only do they need to trace a user transaction across numerous microservices on the same system, but they also need to happen pretty fast ...
Performance monitoring is an old problem. As technology has advanced, we've had to evolve how we monitor applications. Initially, performance monitoring largely involved sending ICMP messages to start troubleshooting a down or slow application. Applications have gotten much more complex, so this is no longer enough. Now we need to know not just whether an application is broken, but why it broke. So APM has had to evolve over the years for us to get there. But how did this evolution take place, and what happens next? Let's find out ...
There are some IT organizations that are using DevOps methodology but are wary of getting bogged down in ITSM procedures. But without at least some ITSM controls in place, organizations lose their focus on systematic customer engagement, making it harder for them to scale ...
If you have deployed a Java application in production, you've probably encountered a situation where the application suddenly starts to take up a large amount of CPU. When this happens, application response becomes sluggish and users begin to complain about slow response. Often the solution to this problem is to restart the application and, lo and behold, the problem goes away — only to reappear a few days later. A key question then is: how to troubleshoot high CPU usage of a Java application? ...
Operations are no longer tethered tightly to a main office, as the headquarters-centric model has been retired in favor of a more decentralized enterprise structure. Rather than focus the business around a single location, enterprises are now comprised of a web of remote offices and individuals, where network connectivity has broken down the geographic barriers that in the past limited the availability of talent and resources. Key to the success of the decentralized enterprise model is a new generation of collaboration and communication tools ...
To better understand the AI maturity of businesses, Dotscience conducted a survey of 500 industry professionals. Research findings indicate that although enterprises are dedicating significant time and resources towards their AI deployments, many data science and ML teams don't have the adequate tools needed to properly collaborate on, build and deploy AI models efficiently ...
Digital transformation, migration to the enterprise cloud and increasing customer demands are creating a surge in IT complexity and the associated costs of managing it. Technical leaders around the world are concerned about the effect this has on IT performance and ultimately, their business according to a new report from Dynatrace, based on an independent global survey of 800 CIOs, Top Challenges for CIOs in a Software-Driven, Hybrid, Multi-Cloud World ...
APM tools are your window into your application's performance — its capacity and levels of service. However, traditional APM tools are now struggling due to the mismatch between their specifications and expectations. Modern application architectures are multi-faceted; they contain hybrid components across a variety of on-premise and cloud applications. Modern enterprises often generate data in silos with each outflow having its own data structure. This data comes from several tools over different periods of time. Such diversity in sources, structure, and formats present unique challenges for traditional enterprise tools ...
Today's organizations clearly understand the value of digital transformation and its ability to spark innovation. It's surprising that fewer than half of organizations have undertaken a digital transformation project. Workfront has identified five of the top challenges that IT teams face in digital transformation — and how to overcome them ...