Skip to main content

No Escalations ≠ No Work: Why Visibility in DevOps Matters More Now That AI Is Accelerating Everything

Alka Malik
Ivanti

The quietest week your engineering team has ever had might also be its best.

No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?"

And there we go. It's the corporate equivalent of, "The medicine is working, so I don't need to take it anymore."

I've spent many years in engineering, and this pattern keeps repeating. The better a DevOps team gets at preventing problems, the more invisible their work becomes. And invisible work is dangerously easy to undervalue.

That was already true before AI entered the picture. Now that 90% of developers report using AI at work — a 14% jump over last year, according to Google's 2025 DORA research — the pace of change is accelerating, the volume of code is increasing and the gap between what teams are doing and what leadership can see is growing wider by the month.

Quiet Systems Are Expensive to Maintain

There's a misconception that a lack of escalations means a lack of effort. The opposite is almost always true. When production is stable, it's because someone spotted a memory leak before it cascaded. Someone else automated a failover that fired at 3 a.m. without waking anyone up. A third person spent two weeks refactoring a deployment pipeline so releases stopped breaking on Fridays.

None of that shows up in an incident report. None of them trigger a heroic war room. And none of it gets the same organizational attention as the DevOps team that spent 72 hours recovering from an outage — even though preventing the outage was harder.

What Makes AI-Assisted Teams Succeed

DORA's 2025 research identifies seven foundational capabilities that determine whether AI adoption helps or hurts an organization. Two stand out for engineering leaders managing the invisible-work problem.

A clear and communicated AI stance

When organizations establish and socialize explicit policies on how developers are expected and permitted to use AI tools, the research found — with high confidence — that AI's positive effect on individual effectiveness and organizational performance is amplified, and friction decreases. Without that clarity, developers either hold back out of fear of overstepping or use AI in ways they shouldn't. Neither is productive.

As I told my engineering leadership team recently: our mandate is non-negotiable. We must accelerate execution and productivity without compromising reliability, scalability, or security. But that mandate only works if every developer knows exactly which tools are sanctioned, what the guardrails look like and where the boundaries are. Ambiguity kills adoption.

A quality internal platform

This is a headline-worthy finding: data shows that the positive effect of AI adoption on organizational performance depends on the quality of the internal platform. When platform quality is low, AI adoption has a negligible effect on organizational performance. When platform quality is high, the effect is strong and positive. Gartner's January 2026 Platform Engineering Maturity Model reinforces this, noting that platform engineering is now foundational for speed, consistency, governance and AI readiness. Their data shows 44% of software engineering leaders report skills gaps specifically in AI, platform engineering and security.

This is why we're investing in treating our platform as a product — with a defined roadmap, developer personas and structured feedback loops — rather than a collection of tools that somebody maintains on the side.

How to Make Invisible Impact Visible

If your best work is prevention — and increasingly, if your best work involves knowing how to deploy AI effectively within a complex delivery system — you need a measurement strategy that captures it. Start with a simple, leader-owned operating rhythm:

1. Weekly: Review a small set of SLIs/SLO error-budget signals and the top anomalies so you can ask "what did we catch early?" before customers felt.

2. Monthly: Inspect trends with engineering and platform leads to connect delivery speed to stability (and agree on one improvement focus).

3. Quarterly: Tie reliability outcomes to customer and business health, then fund the highest-leverage preventative work in the roadmap.

4. Start here: Pick one customer-critical service, define 1–3 SLIs, and publish an SLO dashboard that leadership reviews on a calendar.

SLOs and SLIs tied to customer health

Once you have delivery baselines, service level objectives and service level indicators anchor your team's performance to something leadership already cares about: customer experience. When your SLO dashboard shows 99.95% availability over the last quarter, that number reflects hundreds of small interventions that kept it there. Tie SLIs to business metrics wherever possible. Latency on checkout flow. Error rates on API calls from your biggest customers. Response time on authentication. These make proactive work legible to people who don't read deployment logs.

More simply: anomaly detection is the early-warning layer that makes prevention show up in weekly reviews. It surfaces weak signals before customers feel them and turns "nothing happened" into a measurable outcome.

Fire Prevention > Firefighting

In my experience, engineering culture seems to have a hero problem. We celebrate the person who stayed up all night fixing a production outage. We rarely celebrate the person who spent a quiet Tuesday tuning alerts, so the outage never happened.

This isn't abstract. Gartner research indicates that 87% of businesses experience revenue decreases for every hour of downtime. The proactive work that prevents those hours from happening is worth real money. But if your promotion criteria and performance reviews only capture incident response, you're incentivizing the wrong behavior. You're telling your team: let things break, then fix them heroically.

A few concrete shifts help:

  • Include prevention metrics in performance reviews. How many incidents did this person's work prevent? What reliability improvements did they drive? How did their automation reduce manual toil for the team?
  • Make proactive work a first-class citizen in sprint planning. Reliability engineering, observability improvements and documentation shouldn't be "tech debt" items that get deprioritized every cycle. Build them into the plan with the same weight as feature work.
  • Report on what didn't happen. Sounds counterintuitive, but framing quarterly reviews around "here's what our monitoring caught before it reached customers" is powerful. It puts the invisible work into a context that leadership understands.
  • Measure AI's actual delivery impact, not just its perceived productivity boost. Given the gap between how productive developers feel when using AI and what the delivery metrics show, track both. Perception data is valuable. So is cycle time, change failure rate and rework rate. If those diverge, you have a coaching opportunity (as opposed to a tool problem.

Your Quiet Dashboard Is Really Shouting at You

A flat incident graph and an empty escalation queue aren't signs that your DevOps team has too little to do. They're evidence that your team is doing exactly the right things  —  and doing them well.

The engineering leader's job is to make that evidence visible. Not to justify headcount or pad reports, but because the work of prevention deserves the same organizational recognition as the work of response. And as AI accelerates the pace of delivery — generating more code, shipping more changes and creating more surface area for things to go wrong — the value of prevention rises while attribution gets harder. It's so important (and valuable!) for teams to prevent problems before those problems reach customers, and to measure that work in ways leadership can actually see.

Google CEO Sundar Pichai's mentor, the late Bill Campbell, used to ask him one question every week: "What ties did you break this week?" As engineering leaders, maybe we should be asking ourselves a different version of that question: What fires did we prevent this week — and can we prove it?

The answer makes a difference. Make sure it's on record.

Alka Malik is SVP of Engineering at Ivanti

The Latest

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

No Escalations ≠ No Work: Why Visibility in DevOps Matters More Now That AI Is Accelerating Everything

Alka Malik
Ivanti

The quietest week your engineering team has ever had might also be its best.

No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?"

And there we go. It's the corporate equivalent of, "The medicine is working, so I don't need to take it anymore."

I've spent many years in engineering, and this pattern keeps repeating. The better a DevOps team gets at preventing problems, the more invisible their work becomes. And invisible work is dangerously easy to undervalue.

That was already true before AI entered the picture. Now that 90% of developers report using AI at work — a 14% jump over last year, according to Google's 2025 DORA research — the pace of change is accelerating, the volume of code is increasing and the gap between what teams are doing and what leadership can see is growing wider by the month.

Quiet Systems Are Expensive to Maintain

There's a misconception that a lack of escalations means a lack of effort. The opposite is almost always true. When production is stable, it's because someone spotted a memory leak before it cascaded. Someone else automated a failover that fired at 3 a.m. without waking anyone up. A third person spent two weeks refactoring a deployment pipeline so releases stopped breaking on Fridays.

None of that shows up in an incident report. None of them trigger a heroic war room. And none of it gets the same organizational attention as the DevOps team that spent 72 hours recovering from an outage — even though preventing the outage was harder.

What Makes AI-Assisted Teams Succeed

DORA's 2025 research identifies seven foundational capabilities that determine whether AI adoption helps or hurts an organization. Two stand out for engineering leaders managing the invisible-work problem.

A clear and communicated AI stance

When organizations establish and socialize explicit policies on how developers are expected and permitted to use AI tools, the research found — with high confidence — that AI's positive effect on individual effectiveness and organizational performance is amplified, and friction decreases. Without that clarity, developers either hold back out of fear of overstepping or use AI in ways they shouldn't. Neither is productive.

As I told my engineering leadership team recently: our mandate is non-negotiable. We must accelerate execution and productivity without compromising reliability, scalability, or security. But that mandate only works if every developer knows exactly which tools are sanctioned, what the guardrails look like and where the boundaries are. Ambiguity kills adoption.

A quality internal platform

This is a headline-worthy finding: data shows that the positive effect of AI adoption on organizational performance depends on the quality of the internal platform. When platform quality is low, AI adoption has a negligible effect on organizational performance. When platform quality is high, the effect is strong and positive. Gartner's January 2026 Platform Engineering Maturity Model reinforces this, noting that platform engineering is now foundational for speed, consistency, governance and AI readiness. Their data shows 44% of software engineering leaders report skills gaps specifically in AI, platform engineering and security.

This is why we're investing in treating our platform as a product — with a defined roadmap, developer personas and structured feedback loops — rather than a collection of tools that somebody maintains on the side.

How to Make Invisible Impact Visible

If your best work is prevention — and increasingly, if your best work involves knowing how to deploy AI effectively within a complex delivery system — you need a measurement strategy that captures it. Start with a simple, leader-owned operating rhythm:

1. Weekly: Review a small set of SLIs/SLO error-budget signals and the top anomalies so you can ask "what did we catch early?" before customers felt.

2. Monthly: Inspect trends with engineering and platform leads to connect delivery speed to stability (and agree on one improvement focus).

3. Quarterly: Tie reliability outcomes to customer and business health, then fund the highest-leverage preventative work in the roadmap.

4. Start here: Pick one customer-critical service, define 1–3 SLIs, and publish an SLO dashboard that leadership reviews on a calendar.

SLOs and SLIs tied to customer health

Once you have delivery baselines, service level objectives and service level indicators anchor your team's performance to something leadership already cares about: customer experience. When your SLO dashboard shows 99.95% availability over the last quarter, that number reflects hundreds of small interventions that kept it there. Tie SLIs to business metrics wherever possible. Latency on checkout flow. Error rates on API calls from your biggest customers. Response time on authentication. These make proactive work legible to people who don't read deployment logs.

More simply: anomaly detection is the early-warning layer that makes prevention show up in weekly reviews. It surfaces weak signals before customers feel them and turns "nothing happened" into a measurable outcome.

Fire Prevention > Firefighting

In my experience, engineering culture seems to have a hero problem. We celebrate the person who stayed up all night fixing a production outage. We rarely celebrate the person who spent a quiet Tuesday tuning alerts, so the outage never happened.

This isn't abstract. Gartner research indicates that 87% of businesses experience revenue decreases for every hour of downtime. The proactive work that prevents those hours from happening is worth real money. But if your promotion criteria and performance reviews only capture incident response, you're incentivizing the wrong behavior. You're telling your team: let things break, then fix them heroically.

A few concrete shifts help:

  • Include prevention metrics in performance reviews. How many incidents did this person's work prevent? What reliability improvements did they drive? How did their automation reduce manual toil for the team?
  • Make proactive work a first-class citizen in sprint planning. Reliability engineering, observability improvements and documentation shouldn't be "tech debt" items that get deprioritized every cycle. Build them into the plan with the same weight as feature work.
  • Report on what didn't happen. Sounds counterintuitive, but framing quarterly reviews around "here's what our monitoring caught before it reached customers" is powerful. It puts the invisible work into a context that leadership understands.
  • Measure AI's actual delivery impact, not just its perceived productivity boost. Given the gap between how productive developers feel when using AI and what the delivery metrics show, track both. Perception data is valuable. So is cycle time, change failure rate and rework rate. If those diverge, you have a coaching opportunity (as opposed to a tool problem.

Your Quiet Dashboard Is Really Shouting at You

A flat incident graph and an empty escalation queue aren't signs that your DevOps team has too little to do. They're evidence that your team is doing exactly the right things  —  and doing them well.

The engineering leader's job is to make that evidence visible. Not to justify headcount or pad reports, but because the work of prevention deserves the same organizational recognition as the work of response. And as AI accelerates the pace of delivery — generating more code, shipping more changes and creating more surface area for things to go wrong — the value of prevention rises while attribution gets harder. It's so important (and valuable!) for teams to prevent problems before those problems reach customers, and to measure that work in ways leadership can actually see.

Google CEO Sundar Pichai's mentor, the late Bill Campbell, used to ask him one question every week: "What ties did you break this week?" As engineering leaders, maybe we should be asking ourselves a different version of that question: What fires did we prevent this week — and can we prove it?

The answer makes a difference. Make sure it's on record.

Alka Malik is SVP of Engineering at Ivanti

The Latest

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...