Downtime

5 Takeaways from the Observability Forecast for Media and Entertainment

June 02, 2026

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Signs It May Be Time to Reassess Your IT Infrastructure Strategy

May 26, 2026

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Alert Fatigue Is No Longer a Morale Problem, It's a Reliability Risk and a System Failure

May 20, 2026

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

Almost Half of AI-Generated Code Fails in Production

May 15, 2026

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Reliability Is the New Bottleneck of Innovation

May 12, 2026

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

5 Takeaways from the Observability Forecast for Retail and eCommerce

May 11, 2026

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

The SRE Report 2026: Reliability Is Being Redefined

May 07, 2026

Reliability is no longer proven by uptime alone, according to the The SRE Report 2026 from LogicMonitor. In the AI era, it is experienced through speed, consistency, and user trust, and increasingly judged by business impact. As digital services grow more complex and AI systems move into production, traditional monitoring approaches are struggling to keep pace, increasing the need for AI-first observability that spans applications, infrastructure, and the Internet ...

Why AI Is the Differentiator for Operationally Resilient Organizations

May 05, 2026

In the world of digital-first business, there is no tolerance for service outages. Businesses know that outages are the quickest way to lose money and customers. For smaller organizations, unplanned downtime could even force the business to close ... A new study from PagerDuty, The State of AI-First Operations, reveals that companies actively incorporating AI into operations now view operational resilience as a growth driver rather than a cost center. But how are they achieving it? ...

AI Scale Is Outpacing Infrastructure - and IT Leaders Are Running Out of Time

April 21, 2026

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

What 15 Years of Building Payment Systems Taught Me About Microservices That Nobody Talks About

April 02, 2026

A payment gateway fails at 2 AM. Thousands of transactions hang in limbo. Post-mortems reveal failures cascading across dozens of services, each technically sound in isolation. The diagnosis takes hours. The fix requires coordinated deployments across teams ...

Organizations Can Lose $1M+ Per Hour During Unplanned Disruptions

March 27, 2026

The financial stakes of extended service disruption has made operational resilience a top priority, according to 2026 State of AI-First Operations Report, a report from PagerDuty. According to survey findings, 95% of respondents believe their leadership understands the competitive advantage that can be gained from reducing incidents and speeding recovery ...

Payment System Failures Put Canadian Businesses at Financial and Reputational Risk

March 20, 2026

Payment disruption is placing growing pressure on Canadian businesses. An estimated $7.6 billion in retail and hospitality sales is at risk each year due to payment system failures. A new collaborative report by FreedomPay, Dynatrace and Retail Economics reveals Canadians will wait just six minutes during a service outage before abandoning a purchase. However, the average outage lasts 67 minutes, leaving businesses susceptible to significant financial losses and potential damage to consumer trust and loyalty ...

AI Agents Are Building Databases. Who's Governing the Changes?

March 18, 2026

AI agents are starting to do something that used to be slow by design. They are creating databases, spinning up branches, and iterating on the data layer as part of the build loop. You can argue about the exact percentages in any one report, but the direction is unmistakable. The database is moving from foundational infrastructure to active surface area for modern applications, and that shift is going to collide with how most enterprises still control change ...

Enterprise Resilience: Understanding the Shift from Static to Dynamic

March 09, 2026

Resilience can no longer be defined by how quickly an organization recovers from an incident or disruption. The effectiveness of any resilience strategy is dependent on its ability to anticipate change, operate under continuous stress, and adapt confidently amid uncertainty ...

2026 Will Force Enterprises to Rethink the Cloud's "Always On" Myth

February 24, 2026

2025 was the year everybody finally saw the cracks in the foundation. If you were running production workloads, you probably lived through at least one outage you could not explain to your executives without pulling up a diagram and a whiteboard ...

Crisis Communications: When the Outage Hits, Your Communications Can't Be "Investigating"

February 13, 2026

Outages aren't new. What's new is how quickly they spread across systems, vendors, regions and customer workflows. The moment that performance degrades, expectations escalate fast. In today's always-on environment, an outage isn't just a technical event. It's a trust event ...

Turning Foresight into Resilience: Reclaiming Prevention in the Age of Exposure

February 04, 2026

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

Payment Outages Threaten $44.4 Billion in US Retail and Hospitality Sales Annually

January 23, 2026

Payment system failures are putting $44.4 billion in US retail and hospitality sales at risk each year, underscoring how quickly disruption can derail day-to-day trading, according to research conducted by Dynatrace ... The findings show that payment failures are no longer isolated incidents, but part of a recurring operational challenge that disrupts service, damages customer trust, and negatively impacts revenue ...

2026 Observability Predictions - Part 8

December 18, 2025

In APMdigest's 2026 Observability Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 8 covers outages, downtime and availability ...

The Silent Threat to Retailers' Biggest Quarter: Outages and AI Blind Spots

December 12, 2025

AI continues to be the top story across the industry, but a big test is coming up as retailers make the final preparations before the holiday season starts. Will new AI powered features help load up Santa's sleigh this year? Or are early adopters in for unpleasant surprises in the form of unexpected high costs, poor performance, or even service outages? ...

Agentic Remediation: Capitalizing on the New Era of Database Observability

December 04, 2025

Developers building AI applications are not just looking for fault patterns after deployment; they must detect issues quickly during development and have the ability to prevent issues after going live. Unfortunately, traditional observability tools can no longer meet the needs of AI-driven enterprise application development. AI-powered detection and auto-remediation tools designed to keep pace with rapid development are now emerging to proactively manage performance and prevent downtime ...

When Dashboards Say "Green" But Customers See Red: Why Digital Experience Still Fails at the Last Mile

December 02, 2025

For many retail brands, peak season is the annual stress test of their digital infrastructure. It's also when often technical dashboards glow green, yet customer feedback, digital experience frustration, and conversion trends tell a different story entirely. Over the past several years, we've seen the same pattern across retail, financial services, travel, and media: internal application performance metrics fail to capture the true experience of users connecting over local broadband, mobile carriers, and congested networks using multiple devices across geographies ...

Outages Aren't the Enemy, Complacency Is

November 25, 2025

Three practices, chaos testing, incident retrospectives, and AIOps-driven monitoring, are transforming platform teams from reactive responders into proactive builders of resilient, self-healing systems. The evolution is not just technical; it's cultural. The modern platform engineer isn't just maintaining infrastructure. They're product owners designing for reliability, observability, and continuous improvement ...

Cybersecurity Awesomeness Podcast: Cloudflare Outage

November 21, 2025

Chris Steffen and Ken Buckler from EMA discuss the Cloudflare outage and what availability means in the technology space ...

MEAN TIME TO INSIGHT Podcast - Episode 19: The AWS Outage

October 30, 2025

In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ...

Downtime

5 Takeaways from the Observability Forecast for Media and Entertainment

June 02, 2026

Signs It May Be Time to Reassess Your IT Infrastructure Strategy

May 26, 2026

Alert Fatigue Is No Longer a Morale Problem, It's a Reliability Risk and a System Failure

May 20, 2026

Almost Half of AI-Generated Code Fails in Production

May 15, 2026

Reliability Is the New Bottleneck of Innovation

May 12, 2026

5 Takeaways from the Observability Forecast for Retail and eCommerce

May 11, 2026

The SRE Report 2026: Reliability Is Being Redefined

May 07, 2026

Why AI Is the Differentiator for Operationally Resilient Organizations

May 05, 2026

AI Scale Is Outpacing Infrastructure - and IT Leaders Are Running Out of Time

April 21, 2026

What 15 Years of Building Payment Systems Taught Me About Microservices That Nobody Talks About

April 02, 2026

Organizations Can Lose $1M+ Per Hour During Unplanned Disruptions

March 27, 2026

Payment System Failures Put Canadian Businesses at Financial and Reputational Risk

March 20, 2026

AI Agents Are Building Databases. Who's Governing the Changes?

March 18, 2026

Enterprise Resilience: Understanding the Shift from Static to Dynamic

March 09, 2026

2026 Will Force Enterprises to Rethink the Cloud's "Always On" Myth

February 24, 2026

Crisis Communications: When the Outage Hits, Your Communications Can't Be "Investigating"

February 13, 2026

Turning Foresight into Resilience: Reclaiming Prevention in the Age of Exposure

February 04, 2026

Payment Outages Threaten $44.4 Billion in US Retail and Hospitality Sales Annually

January 23, 2026

2026 Observability Predictions - Part 8

December 18, 2025

The Silent Threat to Retailers' Biggest Quarter: Outages and AI Blind Spots

December 12, 2025

Agentic Remediation: Capitalizing on the New Era of Database Observability

December 04, 2025

When Dashboards Say "Green" But Customers See Red: Why Digital Experience Still Fails at the Last Mile

December 02, 2025

Outages Aren't the Enemy, Complacency Is

November 25, 2025

Cybersecurity Awesomeness Podcast: Cloudflare Outage

November 21, 2025

Chris Steffen and Ken Buckler from EMA discuss the Cloudflare outage and what availability means in the technology space ...

MEAN TIME TO INSIGHT Podcast - Episode 19: The AWS Outage

October 30, 2025

In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ...

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Free Trial

Featured Webinar

Featured eBook

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured eBook

Featured Webinar

Featured eBook

Featured Webinar

Featured eBook

Featured Webinar

Featured Free Tool

Featured Free Tool

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Trial

Featured White Paper

Featured Free Trial

Featured White Paper

Featured eBook

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Free Trial

Featured eBook

Featured Webinar

Featured Report

Featured Report

Featured Webinar

Featured Report

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured Webinar

Featured White Paper

Featured eBook

Featured White Paper

Featured Webinar

Featured Webinar