Skip to main content

How AI Can Turbocharge Your Observability Practice

Mimi Shalash
Splunk

AI has transformed technologies, workflows and entire industries, reshaping how people scale performance analysis. Organizations are seeing that AI has the potential to dramatically strengthen innovation and employee productivity by automating manual tasks and quickly extracting valuable insights. This rapid enterprise adoption is showing no signs of stopping with global AI tool users expected to reach 729 million by 2030, in comparison to the current 314 million users in 2024.

AI's Growing Impact on Observability

As AI improves and strengthens various product innovations and technology functions, it's also influencing and infiltrating the observability space. Observability, a practice used by ITOps and engineering teams to improve digital resilience through lowering the cost of unplanned downtime, provides greater visibility across data, workflows and one's infrastructure as a whole. Just because a server is happy, doesn't mean customers are happy. Observability helps translate technical stability into customer satisfaction and business success and AI amplifies this by driving continuous improvement at scale.

Defining what good looks like can be challenging for customers, requiring time and effort. For example, developers often rely on historical data to determine if an API call should take 10 or 100 milliseconds, then observing performance and setting alerts based on manual thresholds. With AI, developers can automate these tasks by analyzing data at scale to detect patterns and predict optimal performance, lifting the burden from teams.

Reduce Noise Through AIOps

AIOps, or artificial intelligence for IT operations, is a common way that AI is integrated into observability and a natural next step in mature practices. The main goals of AIOps are to accelerate detection, investigation and response times, increasing efficiency and reducing costs. It achieves this by applying machine learning models to intelligently group alerts from different tools that are otherwise noisy. For example, applying integrated ML allows teams to identify anomalies across multiple third party systems, identifying potential downstream impacts, such as increased CPU usage and database latency that otherwise might not have crossed manual alert thresholds.

Surface Insights and Accelerate Investigations Through AI Assistants

Another way organizations can strengthen their observability practice is by incorporating AI assistants. By embedding generative AI into workflows, ITOps and engineering teams can reduce the learning curve for non expert users and troubleshoot faster. Natural language processing (NLP) addresses key challenges like the lack of context for troubleshooting and slow root cause analysis often delayed by tribal knowledge. AI assistants, with intuitive commands and a low barrier to entry, can now answer environment specific questions, ranging from "How many services are running" to "What was the highest response time on the checkout service at the world's leading T-Shirt company, yesterday?" This empowers accessibility, speeds up troubleshooting and drives more efficient decision-making.

Predict and Mitigate Downtime

AI not only drives time savings but also delivers on cost reductions. The occurrence of unplanned downtime goes beyond immediate financial costs and has a lasting impact on a company's shareholder value, brand reputation, innovation velocity and customer trust. Research has shown that 40% of Chief Marketing Officers (CMOs) say downtime impacts customer lifetime value (CLV) and damages reseller and/or partner relationships.

By leveraging AI, companies can proactively minimize downtime and ultimately protect their bottom line. Organizations rely on digital platforms that handle millions of transactions daily and performance is beholden to teams that can adjust resources dynamically, preventing issues before they impact the business.

For example, when identifying recurring patterns of performance degradation linked to high call center volume, AI models can help forecast when the system is likely to experience strain that could lead to customer churn and frustration. With the right insights at the right time, teams can redistribute workloads or fine-tune application configurations before issues occur.

Complement Human Thinking

AI has a profound ability to complement human decision-making by delivering unparalleled speed and precision. However, it does lack the common sense and nuanced judgment that only human intelligence can provide. For ITOps and engineering teams, a single decision can make a big impact on observability outcomes and cause a ripple effect into the business. To ensure a strategic approach to decision-making, ITOps and engineering teams can leverage AI to form a dynamic partnership. AI accelerates insights while human reasoning ensures those insights are applied with context.

In summary, AI's ability to rapidly analyze vast amounts of data, detect anomalies and automate tasks is not only transforming observability, but also the people and processes that make up the practice. While the future holds many possibilities, one thing is clear: as AI becomes a core pillar of observability best practices, it will redefine how we ensure resiliency.

Mimi Shalash is Observability Advisor at Splunk, a Cisco company

The Latest

Significant improvements in operational resilience, more effective use of automation and faster time to market are driving optimism about IT spending in 2025, with a majority of leaders expecting their budgets to increase year-over-year, according to the 2025 State of Digital Operations Report from PagerDuty ...

Image
PagerDuty

Are they simply number crunchers confined to back-office support, or are they the strategic influencers shaping the future of your enterprise? The reality is that data analysts are far more the latter. In fact, 94% of analysts agree their role is pivotal to making high-level business decisions, proving that they are becoming indispensable partners in shaping strategy ...

Today's enterprises exist in rapidly growing, complex IT landscapes that can inadvertently create silos and lead to the accumulation of disparate tools. To successfully manage such growth, these organizations must realize the requisite shift in corporate culture and workflow management needed to build trust in new technologies. This is particularly true in cases where enterprises are turning to automation and autonomic IT to offload the burden from IT professionals. This interplay between technology and culture is crucial in guiding teams using AIOps and observability solutions to proactively manage operations and transition toward a machine-driven IT ecosystem ...

Gartner identified the top data and analytics (D&A) trends for 2025 that are driving the emergence of a wide range of challenges, including organizational and human issues ...

Traditional network monitoring, while valuable, often falls short in providing the context needed to truly understand network behavior. This is where observability shines. In this blog, we'll compare and contrast traditional network monitoring and observability — highlighting the benefits of this evolving approach ...

A recent Rocket Software and Foundry study found that just 28% of organizations fully leverage their mainframe data, a concerning statistic given its critical role in powering AI models, predictive analytics, and informed decision-making ...

What kind of ROI is your organization seeing on its technology investments? If your answer is "it's complicated," you're not alone. According to a recent study conducted by Apptio ... there is a disconnect between enterprise technology spending and organizations' ability to measure the results ...

In today’s data and AI driven world, enterprises across industries are utilizing AI to invent new business models, reimagine business and achieve efficiency in operations. However, enterprises may face challenges like flawed or biased AI decisions, sensitive data breaches and rising regulatory risks ...

In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions.... 

There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

Image
Guardsquare

How AI Can Turbocharge Your Observability Practice

Mimi Shalash
Splunk

AI has transformed technologies, workflows and entire industries, reshaping how people scale performance analysis. Organizations are seeing that AI has the potential to dramatically strengthen innovation and employee productivity by automating manual tasks and quickly extracting valuable insights. This rapid enterprise adoption is showing no signs of stopping with global AI tool users expected to reach 729 million by 2030, in comparison to the current 314 million users in 2024.

AI's Growing Impact on Observability

As AI improves and strengthens various product innovations and technology functions, it's also influencing and infiltrating the observability space. Observability, a practice used by ITOps and engineering teams to improve digital resilience through lowering the cost of unplanned downtime, provides greater visibility across data, workflows and one's infrastructure as a whole. Just because a server is happy, doesn't mean customers are happy. Observability helps translate technical stability into customer satisfaction and business success and AI amplifies this by driving continuous improvement at scale.

Defining what good looks like can be challenging for customers, requiring time and effort. For example, developers often rely on historical data to determine if an API call should take 10 or 100 milliseconds, then observing performance and setting alerts based on manual thresholds. With AI, developers can automate these tasks by analyzing data at scale to detect patterns and predict optimal performance, lifting the burden from teams.

Reduce Noise Through AIOps

AIOps, or artificial intelligence for IT operations, is a common way that AI is integrated into observability and a natural next step in mature practices. The main goals of AIOps are to accelerate detection, investigation and response times, increasing efficiency and reducing costs. It achieves this by applying machine learning models to intelligently group alerts from different tools that are otherwise noisy. For example, applying integrated ML allows teams to identify anomalies across multiple third party systems, identifying potential downstream impacts, such as increased CPU usage and database latency that otherwise might not have crossed manual alert thresholds.

Surface Insights and Accelerate Investigations Through AI Assistants

Another way organizations can strengthen their observability practice is by incorporating AI assistants. By embedding generative AI into workflows, ITOps and engineering teams can reduce the learning curve for non expert users and troubleshoot faster. Natural language processing (NLP) addresses key challenges like the lack of context for troubleshooting and slow root cause analysis often delayed by tribal knowledge. AI assistants, with intuitive commands and a low barrier to entry, can now answer environment specific questions, ranging from "How many services are running" to "What was the highest response time on the checkout service at the world's leading T-Shirt company, yesterday?" This empowers accessibility, speeds up troubleshooting and drives more efficient decision-making.

Predict and Mitigate Downtime

AI not only drives time savings but also delivers on cost reductions. The occurrence of unplanned downtime goes beyond immediate financial costs and has a lasting impact on a company's shareholder value, brand reputation, innovation velocity and customer trust. Research has shown that 40% of Chief Marketing Officers (CMOs) say downtime impacts customer lifetime value (CLV) and damages reseller and/or partner relationships.

By leveraging AI, companies can proactively minimize downtime and ultimately protect their bottom line. Organizations rely on digital platforms that handle millions of transactions daily and performance is beholden to teams that can adjust resources dynamically, preventing issues before they impact the business.

For example, when identifying recurring patterns of performance degradation linked to high call center volume, AI models can help forecast when the system is likely to experience strain that could lead to customer churn and frustration. With the right insights at the right time, teams can redistribute workloads or fine-tune application configurations before issues occur.

Complement Human Thinking

AI has a profound ability to complement human decision-making by delivering unparalleled speed and precision. However, it does lack the common sense and nuanced judgment that only human intelligence can provide. For ITOps and engineering teams, a single decision can make a big impact on observability outcomes and cause a ripple effect into the business. To ensure a strategic approach to decision-making, ITOps and engineering teams can leverage AI to form a dynamic partnership. AI accelerates insights while human reasoning ensures those insights are applied with context.

In summary, AI's ability to rapidly analyze vast amounts of data, detect anomalies and automate tasks is not only transforming observability, but also the people and processes that make up the practice. While the future holds many possibilities, one thing is clear: as AI becomes a core pillar of observability best practices, it will redefine how we ensure resiliency.

Mimi Shalash is Observability Advisor at Splunk, a Cisco company

The Latest

Significant improvements in operational resilience, more effective use of automation and faster time to market are driving optimism about IT spending in 2025, with a majority of leaders expecting their budgets to increase year-over-year, according to the 2025 State of Digital Operations Report from PagerDuty ...

Image
PagerDuty

Are they simply number crunchers confined to back-office support, or are they the strategic influencers shaping the future of your enterprise? The reality is that data analysts are far more the latter. In fact, 94% of analysts agree their role is pivotal to making high-level business decisions, proving that they are becoming indispensable partners in shaping strategy ...

Today's enterprises exist in rapidly growing, complex IT landscapes that can inadvertently create silos and lead to the accumulation of disparate tools. To successfully manage such growth, these organizations must realize the requisite shift in corporate culture and workflow management needed to build trust in new technologies. This is particularly true in cases where enterprises are turning to automation and autonomic IT to offload the burden from IT professionals. This interplay between technology and culture is crucial in guiding teams using AIOps and observability solutions to proactively manage operations and transition toward a machine-driven IT ecosystem ...

Gartner identified the top data and analytics (D&A) trends for 2025 that are driving the emergence of a wide range of challenges, including organizational and human issues ...

Traditional network monitoring, while valuable, often falls short in providing the context needed to truly understand network behavior. This is where observability shines. In this blog, we'll compare and contrast traditional network monitoring and observability — highlighting the benefits of this evolving approach ...

A recent Rocket Software and Foundry study found that just 28% of organizations fully leverage their mainframe data, a concerning statistic given its critical role in powering AI models, predictive analytics, and informed decision-making ...

What kind of ROI is your organization seeing on its technology investments? If your answer is "it's complicated," you're not alone. According to a recent study conducted by Apptio ... there is a disconnect between enterprise technology spending and organizations' ability to measure the results ...

In today’s data and AI driven world, enterprises across industries are utilizing AI to invent new business models, reimagine business and achieve efficiency in operations. However, enterprises may face challenges like flawed or biased AI decisions, sensitive data breaches and rising regulatory risks ...

In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions.... 

There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

Image
Guardsquare