Skip to main content

Android WebView Caused a Google App Crash: How to Avoid a Similar Outage

James Smith
SmartBear

On March 22, Android users around the globe suddenly saw notifications pop up on their devices saying that apps had stopped running. Critical apps such as Gmail, Google Pay, Amazon, Yahoo and certain banking apps couldn't be opened, creating widespread consumer concerns. Later, Google revealed the cause was a bug residing in the Android System WebView. Some users were able to remediate this issue by manually uninstalling the latest update and waiting for Google to release a fix. While the issue was resolved by relying on affected consumers to manually update, major crashes and painful manual workarounds can leave a lasting negative impression for users and the brand's reputation.

Software bugs are inevitable in code, so engineering teams don't realistically need to aim for 100% error-free software. However, they should have pre-production quality assurance measures in place that act as a safety net for situations like this. These tools provide comprehensive error diagnostics and actionable insights that allow software engineers to prioritize the bugs creating the most damaging user experience. Even giants like Google and Facebook still experience lapses in this process, but it is a critical step in delivering consistent, quality software.


Post-Mortem Evaluation: Breaking Down App Stability Data from the Crash

At the start of the Android app outage, Bugsnag data illustrating app stability showed four times the volume of regular Android errors registered within one day, indicating significant impact across the Android user base. The Webview bug caused approximately 75% of the crashes in the leading Android projects monitored. These projects saw around 40 times more crashes compared to the same period in the previous week. On top of that, the worst-affected projects saw 200 times the number of crashes compared to the same period in the previous week.

Additionally, an estimated 2 million users were impacted across all apps that were monitored. There was also a detected drop in overall application stability by at least 2% in Android applications, with the worst-affected projects seeing a 10% decrease in app stability scores, meaning 1 in 10 Android customers were experiencing a crash.

It's also worth noting that this Android WebView error was caused by a Native Development Kit error (NDK), which can only be detected if your crash reporting supports NDK crash detection, and if it is enabled. App stability monitoring is critical in situations like this, because certain systems don't make you opt-in for NDK monitoring like you do with others. Make sure NDK error detection is available by default.

Best Practices To Protect Your Apps from Similar Outages

Given that it was an operating system component at fault in this scenario, there is not a lot development teams could have done to prevent applications from crashing in this situation. However, there are many other types of serious app outages that can be prevented by implementing best practices and defensive programming. Below are some proactive steps engineering teams can take to protect their applications from similar problems that may impact application stability:

1. Monitor for Stability Issues in Production

This is critical for engineering teams to gain immediate visibility into crashes and spikes in errors. Not only can engineering react quickly to fix issues, but it supports impact analysis which can be used to provide clear guidance to support and customer success teams to handle customer communications with confidence. Configure team notifications and incident management integrations to quickly align the team and deal with business-critical issues.

2. Track Application Freezes

This will give the team visibility into if certain features are the root cause of any ANRs (Application Not Responding) being captured. You can track application freezes by using the stack trace to see if the line of code that was running when the application froze and set off the ANR. Stack trace information identifies where in the program the error occurs so that it can be fixed.

3. A/B Test New Features

This will help teams understand how certain features are impacting application stability before releasing them to production. You should also always phase the rollouts and test features with a small group of users before releasing to your entire user base.

The Key Takeaway

Because consumers rely heavily on mobile apps to navigate day-to-day life, application stability is absolutely critical, especially in today's relentlessly competitive environment. Difficult-to-prevent system errors like the Android Systems Webview crash highlight the importance of minimizing preventable errors with defensive programming and better handling of malformed data.

The silver lining of outages like this is that it draws attention to the dire need for good software design and process. It surfaces where software engineering teams need to introduce new best practices or where to to fine-tune existing ones.

James Smith is SVP of the Bugsnag Product Group at SmartBear

The Latest

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

Android WebView Caused a Google App Crash: How to Avoid a Similar Outage

James Smith
SmartBear

On March 22, Android users around the globe suddenly saw notifications pop up on their devices saying that apps had stopped running. Critical apps such as Gmail, Google Pay, Amazon, Yahoo and certain banking apps couldn't be opened, creating widespread consumer concerns. Later, Google revealed the cause was a bug residing in the Android System WebView. Some users were able to remediate this issue by manually uninstalling the latest update and waiting for Google to release a fix. While the issue was resolved by relying on affected consumers to manually update, major crashes and painful manual workarounds can leave a lasting negative impression for users and the brand's reputation.

Software bugs are inevitable in code, so engineering teams don't realistically need to aim for 100% error-free software. However, they should have pre-production quality assurance measures in place that act as a safety net for situations like this. These tools provide comprehensive error diagnostics and actionable insights that allow software engineers to prioritize the bugs creating the most damaging user experience. Even giants like Google and Facebook still experience lapses in this process, but it is a critical step in delivering consistent, quality software.


Post-Mortem Evaluation: Breaking Down App Stability Data from the Crash

At the start of the Android app outage, Bugsnag data illustrating app stability showed four times the volume of regular Android errors registered within one day, indicating significant impact across the Android user base. The Webview bug caused approximately 75% of the crashes in the leading Android projects monitored. These projects saw around 40 times more crashes compared to the same period in the previous week. On top of that, the worst-affected projects saw 200 times the number of crashes compared to the same period in the previous week.

Additionally, an estimated 2 million users were impacted across all apps that were monitored. There was also a detected drop in overall application stability by at least 2% in Android applications, with the worst-affected projects seeing a 10% decrease in app stability scores, meaning 1 in 10 Android customers were experiencing a crash.

It's also worth noting that this Android WebView error was caused by a Native Development Kit error (NDK), which can only be detected if your crash reporting supports NDK crash detection, and if it is enabled. App stability monitoring is critical in situations like this, because certain systems don't make you opt-in for NDK monitoring like you do with others. Make sure NDK error detection is available by default.

Best Practices To Protect Your Apps from Similar Outages

Given that it was an operating system component at fault in this scenario, there is not a lot development teams could have done to prevent applications from crashing in this situation. However, there are many other types of serious app outages that can be prevented by implementing best practices and defensive programming. Below are some proactive steps engineering teams can take to protect their applications from similar problems that may impact application stability:

1. Monitor for Stability Issues in Production

This is critical for engineering teams to gain immediate visibility into crashes and spikes in errors. Not only can engineering react quickly to fix issues, but it supports impact analysis which can be used to provide clear guidance to support and customer success teams to handle customer communications with confidence. Configure team notifications and incident management integrations to quickly align the team and deal with business-critical issues.

2. Track Application Freezes

This will give the team visibility into if certain features are the root cause of any ANRs (Application Not Responding) being captured. You can track application freezes by using the stack trace to see if the line of code that was running when the application froze and set off the ANR. Stack trace information identifies where in the program the error occurs so that it can be fixed.

3. A/B Test New Features

This will help teams understand how certain features are impacting application stability before releasing them to production. You should also always phase the rollouts and test features with a small group of users before releasing to your entire user base.

The Key Takeaway

Because consumers rely heavily on mobile apps to navigate day-to-day life, application stability is absolutely critical, especially in today's relentlessly competitive environment. Difficult-to-prevent system errors like the Android Systems Webview crash highlight the importance of minimizing preventable errors with defensive programming and better handling of malformed data.

The silver lining of outages like this is that it draws attention to the dire need for good software design and process. It surfaces where software engineering teams need to introduce new best practices or where to to fine-tune existing ones.

James Smith is SVP of the Bugsnag Product Group at SmartBear

The Latest

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...