Downtime
In APMdigest's 2026 Observability Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 8 covers outages, downtime and availability ...
AI continues to be the top story across the industry, but a big test is coming up as retailers make the final preparations before the holiday season starts. Will new AI powered features help load up Santa's sleigh this year? Or are early adopters in for unpleasant surprises in the form of unexpected high costs, poor performance, or even service outages? ...
Developers building AI applications are not just looking for fault patterns after deployment; they must detect issues quickly during development and have the ability to prevent issues after going live. Unfortunately, traditional observability tools can no longer meet the needs of AI-driven enterprise application development. AI-powered detection and auto-remediation tools designed to keep pace with rapid development are now emerging to proactively manage performance and prevent downtime ...
For many retail brands, peak season is the annual stress test of their digital infrastructure. It's also when often technical dashboards glow green, yet customer feedback, digital experience frustration, and conversion trends tell a different story entirely. Over the past several years, we've seen the same pattern across retail, financial services, travel, and media: internal application performance metrics fail to capture the true experience of users connecting over local broadband, mobile carriers, and congested networks using multiple devices across geographies ...
Three practices, chaos testing, incident retrospectives, and AIOps-driven monitoring, are transforming platform teams from reactive responders into proactive builders of resilient, self-healing systems. The evolution is not just technical; it's cultural. The modern platform engineer isn't just maintaining infrastructure. They're product owners designing for reliability, observability, and continuous improvement ...
Chris Steffen and Ken Buckler from EMA discuss the Cloudflare outage and what availability means in the technology space ...
In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ...
Collaboration tools have become the backbone of modern business ... Yet despite this central role, collaboration performance remains one of the most poorly monitored aspects of enterprise IT. The issue isn't a lack of investment in tooling. Most organizations have performance dashboards, application uptime metrics, and usage analytics. What they often lack is insight into the actual experience users have when trying to collaborate in real time ...
AI can be a critical part of the IT puzzle by helping to accelerate incident response, reduce downtime and keep customers happy. These gains can shift executive attitudes, as leaders come to see AI agents not just as experimental tools, but as reliable partners in mission-critical situations. It's no surprise, then, that 81% of IT and business executives now trust AI agents to take action during a crisis ...
New Relic's 2025 Observability Forecast ... found that with a median annual cost of high-impact IT outages reaching $76 million, organizations are investing in AI-strengthened observability to detect and resolve issues faster. Here are 5 key takeaways from this year's report ...
Executive trust in AI agents and reliance on AI across business operations is growing, according to the PagerDuty AI Resilience Survey — 81% of executives trust AI agents to take action on the company's behalf during a crisis, such as a service outage or security event ...
The observability landscape has transformed dramatically over the past decade. What began as traditional application performance monitoring (APM) has evolved into something more sophisticated and deeply essential to business operations. As we look at where the industry is headed, three themes have emerged that will define the future of how organizations monitor and manage their digital infrastructure ...
Adequately preventing and responding to disruptions has never been more important — or more possible. The growing ubiquity of AI has introduced more automated workstreams and increased productivity, while simultaneously creating a greater need for better data management. As customer expectations increasingly align with always-on services, the ability to prevent and recover from disruptions has direct ties to a business's bottom line ...
A major architectural shift is underway across enterprise networks, according to a new global study from Cisco. As AI assistants, agents, and data-driven workloads reshape how work gets done, they're creating faster, more dynamic, more latency-sensitive, and more complex network traffic. Combined with the ubiquity of connected devices, 24/7 uptime demands, and intensifying security threats, these shifts are driving infrastructure to adapt and evolve ...
The development of banking apps was supposed to provide users with convenience, control and piece of mind. However, for thousands of Halifax customers recently, a major mobile outage caused the exact opposite, leaving customers unable to check balances, or pay bills, sparking widespread frustration. This wasn't an isolated incident ... So why are these failures still happening? ...
The prevention of data center outages continues to be a strategic priority for data center owners and operators. Infrastructure equipment has improved, but the complexity of modern architectures and evolving external threats presents new risks that operators must actively manage, according to the Data Center Outage Analysis 2025 from Uptime Institute ...
Telecommunications is expanding at an unprecedented pace ... But progress brings complexity. As WanAware's 2025 Telecom Observability Benchmark Report reveals, many operators are discovering that modernization requires more than physical build outs and CapEx — it also demands the tools and insights to manage, secure, and optimize this fast-growing infrastructure in real time ...
Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...
Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...
Lose your data and the best case scenario is, well, you know the word — but at worst, it is game over. And so World Backup Day has traditionally carried a very simple yet powerful message for businesses: Backup. Your. Data ...
IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...
Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...
E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...
In APMdigest's 2025 Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2025. Part 6 covers cloud, the edge and IT outages ...
Technology leaders will invest in AI-driven customer experience (CX) strategies in the year ahead as they build more dynamic, relevant and meaningful connections with their target audiences ... As AI shifts the CX paradigm from reactive to proactive, tech leaders and their teams will embrace these five AI-driven strategies that will improve customer support and cybersecurity while providing smoother, more reliable service offerings ...