
The quietest week your engineering team has ever had might also be its best.
No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?"
And there we go. It's the corporate equivalent of, "The medicine is working, so I don't need to take it anymore."
I've spent many years in engineering, and this pattern keeps repeating. The better a DevOps team gets at preventing problems, the more invisible their work becomes. And invisible work is dangerously easy to undervalue.
That was already true before AI entered the picture. Now that 90% of developers report using AI at work — a 14% jump over last year, according to Google's 2025 DORA research — the pace of change is accelerating, the volume of code is increasing and the gap between what teams are doing and what leadership can see is growing wider by the month.
Quiet Systems Are Expensive to Maintain
There's a misconception that a lack of escalations means a lack of effort. The opposite is almost always true. When production is stable, it's because someone spotted a memory leak before it cascaded. Someone else automated a failover that fired at 3 a.m. without waking anyone up. A third person spent two weeks refactoring a deployment pipeline so releases stopped breaking on Fridays.
None of that shows up in an incident report. None of them trigger a heroic war room. And none of it gets the same organizational attention as the DevOps team that spent 72 hours recovering from an outage — even though preventing the outage was harder.
What Makes AI-Assisted Teams Succeed
DORA's 2025 research identifies seven foundational capabilities that determine whether AI adoption helps or hurts an organization. Two stand out for engineering leaders managing the invisible-work problem.
A clear and communicated AI stance
When organizations establish and socialize explicit policies on how developers are expected and permitted to use AI tools, the research found — with high confidence — that AI's positive effect on individual effectiveness and organizational performance is amplified, and friction decreases. Without that clarity, developers either hold back out of fear of overstepping or use AI in ways they shouldn't. Neither is productive.
As I told my engineering leadership team recently: our mandate is non-negotiable. We must accelerate execution and productivity without compromising reliability, scalability, or security. But that mandate only works if every developer knows exactly which tools are sanctioned, what the guardrails look like and where the boundaries are. Ambiguity kills adoption.
A quality internal platform
This is a headline-worthy finding: data shows that the positive effect of AI adoption on organizational performance depends on the quality of the internal platform. When platform quality is low, AI adoption has a negligible effect on organizational performance. When platform quality is high, the effect is strong and positive. Gartner's January 2026 Platform Engineering Maturity Model reinforces this, noting that platform engineering is now foundational for speed, consistency, governance and AI readiness. Their data shows 44% of software engineering leaders report skills gaps specifically in AI, platform engineering and security.
This is why we're investing in treating our platform as a product — with a defined roadmap, developer personas and structured feedback loops — rather than a collection of tools that somebody maintains on the side.
How to Make Invisible Impact Visible
If your best work is prevention — and increasingly, if your best work involves knowing how to deploy AI effectively within a complex delivery system — you need a measurement strategy that captures it. Start with a simple, leader-owned operating rhythm:
1. Weekly: Review a small set of SLIs/SLO error-budget signals and the top anomalies so you can ask "what did we catch early?" before customers felt.
2. Monthly: Inspect trends with engineering and platform leads to connect delivery speed to stability (and agree on one improvement focus).
3. Quarterly: Tie reliability outcomes to customer and business health, then fund the highest-leverage preventative work in the roadmap.
4. Start here: Pick one customer-critical service, define 1–3 SLIs, and publish an SLO dashboard that leadership reviews on a calendar.
SLOs and SLIs tied to customer health
Once you have delivery baselines, service level objectives and service level indicators anchor your team's performance to something leadership already cares about: customer experience. When your SLO dashboard shows 99.95% availability over the last quarter, that number reflects hundreds of small interventions that kept it there. Tie SLIs to business metrics wherever possible. Latency on checkout flow. Error rates on API calls from your biggest customers. Response time on authentication. These make proactive work legible to people who don't read deployment logs.
More simply: anomaly detection is the early-warning layer that makes prevention show up in weekly reviews. It surfaces weak signals before customers feel them and turns "nothing happened" into a measurable outcome.
Fire Prevention > Firefighting
In my experience, engineering culture seems to have a hero problem. We celebrate the person who stayed up all night fixing a production outage. We rarely celebrate the person who spent a quiet Tuesday tuning alerts, so the outage never happened.
This isn't abstract. Gartner research indicates that 87% of businesses experience revenue decreases for every hour of downtime. The proactive work that prevents those hours from happening is worth real money. But if your promotion criteria and performance reviews only capture incident response, you're incentivizing the wrong behavior. You're telling your team: let things break, then fix them heroically.
A few concrete shifts help:
- Include prevention metrics in performance reviews. How many incidents did this person's work prevent? What reliability improvements did they drive? How did their automation reduce manual toil for the team?
- Make proactive work a first-class citizen in sprint planning. Reliability engineering, observability improvements and documentation shouldn't be "tech debt" items that get deprioritized every cycle. Build them into the plan with the same weight as feature work.
- Report on what didn't happen. Sounds counterintuitive, but framing quarterly reviews around "here's what our monitoring caught before it reached customers" is powerful. It puts the invisible work into a context that leadership understands.
- Measure AI's actual delivery impact, not just its perceived productivity boost. Given the gap between how productive developers feel when using AI and what the delivery metrics show, track both. Perception data is valuable. So is cycle time, change failure rate and rework rate. If those diverge, you have a coaching opportunity (as opposed to a tool problem.
Your Quiet Dashboard Is Really Shouting at You
A flat incident graph and an empty escalation queue aren't signs that your DevOps team has too little to do. They're evidence that your team is doing exactly the right things — and doing them well.
The engineering leader's job is to make that evidence visible. Not to justify headcount or pad reports, but because the work of prevention deserves the same organizational recognition as the work of response. And as AI accelerates the pace of delivery — generating more code, shipping more changes and creating more surface area for things to go wrong — the value of prevention rises while attribution gets harder. It's so important (and valuable!) for teams to prevent problems before those problems reach customers, and to measure that work in ways leadership can actually see.
Google CEO Sundar Pichai's mentor, the late Bill Campbell, used to ask him one question every week: "What ties did you break this week?" As engineering leaders, maybe we should be asking ourselves a different version of that question: What fires did we prevent this week — and can we prove it?
The answer makes a difference. Make sure it's on record.