My recent best practices report on AIOps titled A CIO's Guide to AIOps just got published. What started out as an AIOps use cases document morphed into an AIOps emerging trends document. However, when feedback was sought from CIOs and CTOs in our research panel, they suggested best practices and how to strategize AI and IT Operations in a broader sense to be included in the report.
This 30 page report is now available to all Constellation Research subscribers. The report has five major sections:
1. What is AIOps?
2. Benefits of AIOps for CIO's (or any enterprise)
3. AIOps core use cases
4. Recommendations and best practices for the CIO
5. And, more importantly, what are the gotchas and my final thoughts.
With the COVID-19 pandemic forcing every business to move online, the majority of enterprises have had to accelerate the maturation of their digital operations. Out of business necessity, every surviving enterprise has devised a way out of the crisis by adding people, processes, and technology in an approach that was most cost-effective and yet offered them a quick way to sustain their business through the pandemic. Consequently, IT and digital operations have become an integral part of every enterprise. IT leaders face massive challenges to be efficient because they have either:
■ Added too many tools and have become siloed
■ Increased complexity
■ Collected more data than they can handle
■ Lost knowledgeable IT resources
The time has come for the IT leaders to reimagine their IT and make it more efficient. IT is finally starting to turn the technology it has been proliferating across enterprises on itself. One such solution set is artificial intelligence for IT operations (AIOps). The following report gives leaders an idea of what to look for in an AI solution for properly retooling to mature their digital operations.
Start with the Main Use Cases for AIOps
AIOps is more than just bolting an AI/machine learning (ML) engine on top of some of the existing monitoring, logging, observability, or IT service management tools. Its goal is to provide better collaboration between siloed teams, faster time to identify and resolve incidents (mean time to resolution, or MTTR), and the ability to identify and resolve the root cause of the incident so the issue will not happen again. It is also about more than just operations. AIOps can and should include support, security, development, ITSM, business stakeholders, incident management, and observability.
I have identified about 7 core use cases for AIOps based on my conversation with many practitioners. There are other fringe use cases that sometimes are executed as part of an AIOps project, but for an enterprise to consider a true AIOps solution, it should at least consider the use cases outlined below.
Do AIOps Right
Enterprises can't succeed in a post-pandemic digital world without AIOps (or without mature digital operations), given the volume of IT operations data produced. Start with some of the core use cases and add the rest as you needed. No need to boil the ocean and try to execute all of them from the get-go. To scale on a consistent basis, achieve revenue goals and operational efficiency targets, and meet compliance requirements, enterprises can't succeed without scale in automation and AI.
With the volume of data from IT operations exploding, demand from customers to have five-9s service availability, the technical resource crunch and high prices caused by the Great Resignation wave, the knowledge gap created by tribal knowledge walking out as baby boomers retire, and volume/fatigue/long hours that induce stress and mental health issues for technical teams, enterprises have to make a hard decision: Either continue to run the business as is by throwing more bodies at the problem, or use AI tools to improve the efficiency of the processes.
A properly implemented AIOps solution should find critical incidents as soon as — sometimes even before — they happen, identify the root cause with very minimal manual intervention, and either alert the right personnel at the right time or potentially, via IT automation capabilities, make the application truly self-healing.
If you are a CIO/CTO and struggling with this issue, I would love to talk to you. I would love you to be part of my growing panel of IT executives that I speak to regularly and share notes with. More importantly, let me know if I missed anything in this report so I can do a follow-up report.
Do you have thoughts, suggestions, or opposing views to my assessments?
What are the common pitfalls you see with your customers or your enterprise implementations?
Do you use AIOps for a use case that I haven't covered?
Have you faced an issue while implementing AIOps that is not listed in the report?
Did you derive a benefit that is not listed in the document?
Let me know. Please reach out to me. I look forward to engaging with you.
Site reliability engineers are development-focused IT professionals who work on developing and implementing solutions that solve reliability, availability, and scale problems. On the other hand, DevOps engineers are ops-focused workers who solve development pipeline problems. While there is a divide between the two professions, both sets of engineers cross the gap regularly, delivering their expertise and opinions to the other side and vice versa ...
Site reliability engineering (SRE) is fast becoming an essential aspect of modern IT operations, particularly in highly scaled, big data environments. As businesses and industries shift to the digital and embrace new IT infrastructures and technologies to remain operational and competitive, the need for a new approach for IT teams to find and manage the balance between launching new systems and features and ensuring these are intuitive, reliable, and friendly for end users has intensified as well ...
The most sophisticated observability practitioners (leaders) are able to cut downtime costs by 90%, from an estimated $23.8 million annually to just $2.5 million, compared to observability beginners, according to the State of Observability 2022 from Splunk in collaboration with the Enterprise Strategy Group. What's more, leaders in observability are more innovative and more successful at achieving digital transformation outcomes and other initiatives ...
Programmatically tracked service level indicators (SLIs) are foundational to every site reliability engineering practice. When engineering teams have programmatic SLIs in place, they lessen the need to manually track performance and incident data. They're also able to reduce manual toil because our DevOps teams define the capabilities and metrics that define their SLI data, which they collect automatically — hence "programmatic" ...
Recently, a regional healthcare organization wanted to retire its legacy monitoring tools and adopt AIOps. The organization asked Windward Consulting to implement an AIOps strategy that would help streamline its outdated and unwieldy IT system management. Our team's AIOps implementation process helped this client and can help others in the industry too. Here's what my team did ...
You've likely heard it before: every business is a digital business. However, some businesses and sectors digitize more quickly than others. Healthcare has traditionally been on the slower side of digital transformation and technology adoption, but that's changing. As healthcare organizations roll out innovations at increasing velocity, they must build a long-term strategy for how they will maintain the uptime of their critical apps and services. And there's only one tool that can ensure this continuous availability in our modern IT ecosystems. AIOps can help IT Operations teams ensure the uptime of critical apps and services ...
Between 2012 to 2015 all of the hyperscalers attempted to use the legacy APM solutions to improve their own visibility. To no avail. The problem was that none of the previous generations of APM solutions could match the scaling demand, nor could they provide interoperability due to their proprietary and exclusive agentry ...
The DevOps journey begins by understanding a team's DevOps flow and identifying precisely what tasks deliver the best return on engineers' time when automated. The rest of this blog will help DevOps team managers by outlining what jobs can — and should be automated ...
A survey from Snow Software polled more than 500 IT leaders to determine the current state of cloud infrastructure. Nearly half of the IT leaders who responded agreed that cloud was critical to operations during the pandemic with the majority deploying a hybrid cloud strategy consisting of both public and private clouds. Unsurprisingly, over the last 12 months, the majority of respondents had increased overall cloud spend — a substantial increase over the 2020 findings ...
As we all know, the drastic changes in the world have caused the workforce to take a hybrid approach over the last two years. A lot of that time, being fully remote. With the back and forth between home and office, employees need ways to stay productive and access useful information necessary to complete their daily work. The ability to obtain a holistic view of data relevant to the user and get answers to topics, no matter the worker's location, is crucial for a successful and efficient hybrid working environment ...