APM AIOps-Fueled Digital Experiences: What it Takes to Win the RACE
April 24, 2019

Viki Paige
Broadcom

Across industries and markets, your competitors are in a race to deliver innovative, consistently optimized digital experiences. Increasingly, this is the race that will separate the market victors from the rest.

While optimizing service levels is critical in this endeavor, it's getting more challenging to do every day. Here are two key reasons:

Complexity. Most enterprise-class business services now rely not only on traditional systems, including on-premises mainframes and distributed platforms, but on a plethora of new, dynamic technologies, such as containers, cloud delivery models, virtual and software-defined components, and more.

Scale. The volume, variety, and velocity of data that needs to be managed, correlated, and analyzed continues to grow dramatically. In the wake of initiatives like multi-cloud deployments, microservices development, and Internet of Things (IoT) implementations, teams continue to see explosive growth in the operational data being generated. Ultimately, your internal team members simply can't keep pace.

To understand the changing nature of complexity and scale, consider the explosive growth in per-host metrics associated with the move to containers. Traditionally, there would be around 150 metrics per host to track, with around 100 relating to the operating system and 50 to an application. Contrast this with a container-based implementation, where there will be 50 metrics per container and 50 metrics per orchestrator on the host. It's quite common to have a cluster running upwards of 100 containers on top of two underlying hosts. As opposed to a traditional implementation where running two hosts would require the monitoring of 300 metrics, in a container-based implementation, there would be over 10,000 metrics to track.

How the R.A.C.E. is Won

To deliver optimized user experiences and contend with the explosive growth in data, complexity, and user demands, your IT teams need to leverage Artificial Intelligence for IT operations (AIOps) capabilities. To win, your teams need to R.A.C.E., harnessing these capabilities: Remediation, Aggregation, Correlation, and Experience.

Remediation

You need comprehensive, contextual automated remediation. Once an issue has been identified, whether predictively or through automated root cause analysis, your IT teams need comprehensive, intelligent capabilities that can automatically execute the remediation tasks required in a complex, dynamic enterprise environment. To ensure success, AIOps platforms need to provide scalable, flexible, and easy-to-use automation that can be aligned with your fast-changing business and technology environments.

Aggregation

To be successful, your IT teams need an AIOps platform that offers broad, efficient aggregation. The platform needs to ingest structured and unstructured data and combine it in a single, resilient data lake. It should be able to ingest data from a broad range of monitoring, management, analytics, and visualization tools and offer support for metric, alarm, log, topology, text, and API data. Once ingested, the data must then be normalized so it can be managed and analyzed uniformly, regardless of the source. Finally, this data must be retained over the long term so it can be used for extensive historical and trend analysis.

Correlation

When performance issues or downtime occur, your IT teams may struggle to determine why. While a single issue may be the culprit, large numbers of redundant or false alerts may be generated, making it difficult for administrators to filter through the noise and identify the issue that needs to be addressed. To combat these challenges, you need timely, targeted insights that can enable fast, automated root cause analysis. To address these requirements, AIOps platforms need to provide machine-learning-driven intelligence that can automatically identify the probable root cause. To support this machine learning, these platforms must consume data and correlate intelligence from multiple architectural layers to effectively determine the probable cause. This correlation eliminates the underlying complexities associated with assimilating data from different technologies and domains so organizations can benefit from truly unified visibility.

Experience

In the end, your users interact with services, not infrastructure. However, when operators see that a particular device or system is experiencing issues, it may be difficult to determine how or if the issue is affecting business services. That's why it's vital that AIOps platforms deliver service-level visibility. Platforms need to offer capabilities for mapping issues to associated services, so your IT teams can intelligently prioritize troubleshooting and remediation efforts based on which issues will have the biggest potential business impact. For example, if two issues arise and administrators can see that one is affecting a payroll service that isn't being run currently, and another is hitting an e-commerce service that runs 24/7 and accounts for the bulk of the company's revenues, they can prioritize their efforts accordingly.

To learn more about how Broadcom can help you on your AIOps journey, be sure to visit our AIOps page.

Viki Paige is Director, AIOps Product Marketing, at Broadcom
Share this