Skip to main content

The APM Blog

Harsh Gulati
Infosys

Seamless shopping is a basic demand of today's boundaryless consumer — one with little patience for friction, limited tolerance for disconnected experiences and minimal hesitation in switching brands. Customers expect intuitive, highly personalized experiences and the ability to move effortlessly across physical and digital channels within the same journey. Failure to deliver can cost dearly ...

Jisu Dasgupta

In many organizations, IT still operates as a reactive service provider. Systems are managed through fragmented tools, teams focus heavily on operational metrics, and business leaders often see IT as a necessary cost center rather than a strategic partner. Even well-run ITIL environments can struggle to bridge the gap between operational excellence and business impact. This is where the concept of ITIL+ comes in ...

Anath Bandhu Chatterjee

A payment gateway fails at 2 AM. Thousands of transactions hang in limbo. Post-mortems reveal failures cascading across dozens of services, each technically sound in isolation. The diagnosis takes hours. The fix requires coordinated deployments across teams ...

Carmen Li
Compute Exchange

For years, infrastructure teams have treated compute as a relatively stable input. Capacity was provisioned, costs were forecasted, and performance expectations were set based on the assumption that identical resources behaved identically. That mental model is starting to break down. AI infrastructure is no longer behaving like static cloud capacity. It is increasingly behaving like a market ...

Michelle Abdow
Market Mentors

Outages aren't new. What's new is how quickly they spread across systems, vendors, regions and customer workflows. The moment that performance degrades, expectations escalate fast. In today's always-on environment, an outage isn't just a technical event. It's a trust event ...

Sunil Thamatam

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

Dennis Perpetua
Kyndryl

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Sameer Dixit
Persistent Systems

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Shamus McGillicuddy

Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...

Christopher Gardner
O'Reilly Media

My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...

Derek Ashmore
Asperitas

Three practices, chaos testing, incident retrospectives, and AIOps-driven monitoring, are transforming platform teams from reactive responders into proactive builders of resilient, self-healing systems. The evolution is not just technical; it's cultural. The modern platform engineer isn't just maintaining infrastructure. They're product owners designing for reliability, observability, and continuous improvement ...

Nazy Fouladirad
Tevora

The more technology businesses invest in, the more potential attack surfaces they have that can be exploited. Without the right continuity plans in place, the disruptions caused by these attacks can bring operations to a standstill and cause irreparable damage to an organization. It's essential to take the time now to ensure your business has the right tools, processes, and recovery initiatives in place to weather any type of IT disaster that comes up. Here are some effective strategies you can follow to achieve this ...

Chandra Rao
Techwave

The biggest change in Cloud Managed Services 2.0 is how it unites domains that once operated in isolation. CloudOps, FinOps, DevOps, SecOps, and AIOps now work as a single, cohesive team instead of separate departments competing for resources and priorities. This matters because modern businesses operate at a pace that leaves traditional methods behind ...

Vijay Pahuja
Cox Automotive

When you build a distributed system with microservices, you embrace flexibility and scalability. But you also open the door to unexpected failures. Networks drop packets. Databases become slow. Code bugs slip through testing. Fault injection lets you surface those hidden weak spots before they surprise your users in production. By deliberately introducing failures into your system, you learn its breaking points, you build confidence in your recovery paths, and you make resilience part of your design rather than an afterthought ...

Scott Effler
Bridgenext

AI agents are already transforming the enterprise ... But while the models are advancing fast, most enterprise systems still aren't ready for agent-to-agent AI. The reason is simple but consequential: the environments we've built don't support autonomous action ...

Pete Goldin
APMdigest

In Part 12, the final installment in the series, the experts present some final predictions about AI's future impact on APM and Observability ...

Pete Goldin
APMdigest

What's in the future for APM and Observability? The experts have some ideas, and some of them even contradict each other. In the final installments of this series, the experts present their visions of the future for APM, Observability and beyond ...

Pete Goldin
APMdigest

AI plays a transformative role in both APM and observability by turning raw data into actionable insights, enabling faster, more accurate detection and resolution of issues ...

Pete Goldin
APMdigest

The story of the evolution of Observability to encompass APM and other IT performance management capabilities would not be complete without discussing the monumental impact of open source ...

Pete Goldin
APMdigest

So after all this discussion, what do the experts say about whether you need APM, observability or both? In today's complex digital landscape, organizations need both APM and Observability to not only react to issues but to anticipate and mitigate them proactively, ensuring robust performance and resilience ...

The APM Blog

Harsh Gulati
Infosys

Seamless shopping is a basic demand of today's boundaryless consumer — one with little patience for friction, limited tolerance for disconnected experiences and minimal hesitation in switching brands. Customers expect intuitive, highly personalized experiences and the ability to move effortlessly across physical and digital channels within the same journey. Failure to deliver can cost dearly ...

Jisu Dasgupta

In many organizations, IT still operates as a reactive service provider. Systems are managed through fragmented tools, teams focus heavily on operational metrics, and business leaders often see IT as a necessary cost center rather than a strategic partner. Even well-run ITIL environments can struggle to bridge the gap between operational excellence and business impact. This is where the concept of ITIL+ comes in ...

Anath Bandhu Chatterjee

A payment gateway fails at 2 AM. Thousands of transactions hang in limbo. Post-mortems reveal failures cascading across dozens of services, each technically sound in isolation. The diagnosis takes hours. The fix requires coordinated deployments across teams ...

Carmen Li
Compute Exchange

For years, infrastructure teams have treated compute as a relatively stable input. Capacity was provisioned, costs were forecasted, and performance expectations were set based on the assumption that identical resources behaved identically. That mental model is starting to break down. AI infrastructure is no longer behaving like static cloud capacity. It is increasingly behaving like a market ...

Michelle Abdow
Market Mentors

Outages aren't new. What's new is how quickly they spread across systems, vendors, regions and customer workflows. The moment that performance degrades, expectations escalate fast. In today's always-on environment, an outage isn't just a technical event. It's a trust event ...

Sunil Thamatam

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

Dennis Perpetua
Kyndryl

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Sameer Dixit
Persistent Systems

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Shamus McGillicuddy

Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...

Christopher Gardner
O'Reilly Media

My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...

Derek Ashmore
Asperitas

Three practices, chaos testing, incident retrospectives, and AIOps-driven monitoring, are transforming platform teams from reactive responders into proactive builders of resilient, self-healing systems. The evolution is not just technical; it's cultural. The modern platform engineer isn't just maintaining infrastructure. They're product owners designing for reliability, observability, and continuous improvement ...

Nazy Fouladirad
Tevora

The more technology businesses invest in, the more potential attack surfaces they have that can be exploited. Without the right continuity plans in place, the disruptions caused by these attacks can bring operations to a standstill and cause irreparable damage to an organization. It's essential to take the time now to ensure your business has the right tools, processes, and recovery initiatives in place to weather any type of IT disaster that comes up. Here are some effective strategies you can follow to achieve this ...

Chandra Rao
Techwave

The biggest change in Cloud Managed Services 2.0 is how it unites domains that once operated in isolation. CloudOps, FinOps, DevOps, SecOps, and AIOps now work as a single, cohesive team instead of separate departments competing for resources and priorities. This matters because modern businesses operate at a pace that leaves traditional methods behind ...

Vijay Pahuja
Cox Automotive

When you build a distributed system with microservices, you embrace flexibility and scalability. But you also open the door to unexpected failures. Networks drop packets. Databases become slow. Code bugs slip through testing. Fault injection lets you surface those hidden weak spots before they surprise your users in production. By deliberately introducing failures into your system, you learn its breaking points, you build confidence in your recovery paths, and you make resilience part of your design rather than an afterthought ...

Scott Effler
Bridgenext

AI agents are already transforming the enterprise ... But while the models are advancing fast, most enterprise systems still aren't ready for agent-to-agent AI. The reason is simple but consequential: the environments we've built don't support autonomous action ...

Pete Goldin
APMdigest

In Part 12, the final installment in the series, the experts present some final predictions about AI's future impact on APM and Observability ...

Pete Goldin
APMdigest

What's in the future for APM and Observability? The experts have some ideas, and some of them even contradict each other. In the final installments of this series, the experts present their visions of the future for APM, Observability and beyond ...

Pete Goldin
APMdigest

AI plays a transformative role in both APM and observability by turning raw data into actionable insights, enabling faster, more accurate detection and resolution of issues ...

Pete Goldin
APMdigest

The story of the evolution of Observability to encompass APM and other IT performance management capabilities would not be complete without discussing the monumental impact of open source ...

Pete Goldin
APMdigest

So after all this discussion, what do the experts say about whether you need APM, observability or both? In today's complex digital landscape, organizations need both APM and Observability to not only react to issues but to anticipate and mitigate them proactively, ensuring robust performance and resilience ...