Avoiding Digital Disaster: How the US Census Can Deliver a Smooth Digital Experience

April 23, 2020

Tal Weiss

OverOps

Most of us have personally felt the effects of an application failing on us on some level. Sometimes, the impact a software outage has on us is trivial — we are temporarily unable to book a flight for vacation, we are forced to call a customer service rep rather than using an online portal, or we're unable to watch an on-demand movie. While frustrating, these experiences are often inconvenient rather than catastrophic.

But what about when the application failure has much larger implications? For example, the site you use to refill important prescriptions malfunctions and delays access to medication, your banking app fails on the day rent is due, or, in the recent case of the Iowa caucus app, a software outage affects your ability to participate in our nation's democratic process.

As more organizations go digital, including critical government agencies, reliable software is paramount. The recent Iowa caucus voting app failure is a case study in software testing and delivery mistakes, with many key learnings for those tasked with building and managing mission-critical applications — like the US Census Bureau.

The 2020 US Census, which is collecting responses beginning April 1, has the potential to significantly influence fair political representation, the allocation of vital federal funds, and more. This year marks the first time in our nation’s history that participants have the option to fill out an online questionnaire rather than mailing in their responses. While this is an exciting digital milestone for the US Census Bureau, experience tells us that the course of digital transformation rarely does run smooth.

In order to ensure the accuracy of results that will impact our nation for the next decade, the census software needs to operate seamlessly. But considering what happened with the Iowa caucus, how confident are we that the census app isn’t going to fail?

Below are two key takeaways from the Iowa Caucus app disaster that should serve as a valuable lesson not only for the IT team supporting the US Census Bureau, but for any engineering team tasked with delivering a mission-critical application with minimal room for error.

Takeaway #1: Test Early. Test Often.

There is a rule in software known as the Rule of Ten: the cost of finding and fixing software defects increases 10X the further you are in the software delivery lifecycle. When pushing out an important new release that will be highly trafficked and highly visible, the more proactive you can be about preventing errors from reaching production, the better. In the case of the US Census app, responses are only collected within a short window, so any unexpected production issues could waste precious minutes or hours.

To address this, many organizations are starting to understand the merits of adopting a "Shift Left" approach to quality. By increasing quality measures taken in the development and testing phases of software delivery, you can significantly reduce the odds of production issues. Of course, you can’t fully anticipate all potential production failure scenarios, but the more testing you do up front, the more confidence you’ll have in your release, rather than relying solely on production monitoring.

Takeaway #2: Automate, Automate, Automate

As the process of writing code remains a very human driven process (AI has yet to pass a coding "Turing test"), companies will need to find ways of automating the way by which they test, deliver and operate their software to ensure speed and reliability.

A decade ago, when Test-Driven Development (TDD) just started gaining traction, it promised to improve productivity and quality. Since then, release cycles shortened, CI/CD is no longer a buzzword, and new companies that develop pipeline automation products are mature enough to IPO.

Building on the points above, testing is more relevant than ever, but when moving fast is table stakes, relying on traditional tests alone in your shift left strategy is no longer an option. Building in automated quality gates and feedback loops will allow for thorough, fast testing that doesn’t hold up release timelines. This can be done by leveraging a variety of automated testing methods within your CI/CD pipeline, such as static and dynamic code analysis.

Further, even with a sophisticated testing pipeline, the occasional error will inevitably reach production from time to time, and your ability to detect, troubleshoot and recover quickly will make all the difference to your users. Developers are great at writing code but inherently limited in their ability to foresee where it will break down later. For this reason, not to mention the massive operational data volume and noise which high scale environments produce, the task of detecting software issues and gathering the information on them in production should be automated. The 30% of time and resources traditionally allocated to manual identification, routing and reproduction of issues during the software delivery lifecycle will most likely become a thing of the past.

As the US Census Bureau takes this high-stakes step toward innovation, it’s my sincere hope that they’ve been able to put some of these methodologies and tools in place. The more proactive you can be about ensuring quality, and the more tasks you can automate throughout the process, the less you will have to fear when it’s time to put your software to the true test — your users.

Tal Weiss is Co-Founder and CTO of OverOps

Hot Topics

Testing

The Latest

The Results Are In: IT Professionals Want More AI and Automation Support

May 01, 2025

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Immutable by Design: Reinventing Business Continuity and Disaster Recovery

April 30, 2025

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

Embracing Cost-Effective Observability Through an OpenTelemetry Approach

April 29, 2025

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Why Employees Hate Security (And What Businesses Can Do About It)

April 28, 2025

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

MEAN TIME TO INSIGHT Podcast - Episode 13: Hybrid Multi-Cloud Networking Strategy

April 25, 2025

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ...

Avoiding Digital Disaster: How the US Census Can Deliver a Smooth Digital Experience

April 23, 2020

Tal Weiss

OverOps

Takeaway #1: Test Early. Test Often.

Takeaway #2: Automate, Automate, Automate

Tal Weiss is Co-Founder and CTO of OverOps

Hot Topics

Testing

The Latest

The Results Are In: IT Professionals Want More AI and Automation Support

May 01, 2025

Immutable by Design: Reinventing Business Continuity and Disaster Recovery

April 30, 2025

Embracing Cost-Effective Observability Through an OpenTelemetry Approach

April 29, 2025

Why Employees Hate Security (And What Businesses Can Do About It)

April 28, 2025

MEAN TIME TO INSIGHT Podcast - Episode 13: Hybrid Multi-Cloud Networking Strategy

April 25, 2025

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ...

Featured White Paper

Featured Report

Featured Webinar

Featured White Paper

Featured White Paper

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Webinar

Featured eBook

Featured White Paper

Featured Free Trial

Featured Free Tool

Featured eBook

Featured Webinar

Featured Report

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Webinar

Featured White Paper

Featured Report

Featured Free Trial

Featured eBook

Featured White Paper

Featured White Paper

Featured Free Trial

Featured Free Trial

Featured Webinar

Featured White Paper

Featured eBook

Featured eBook

Featured White Paper

Featured Webinar

Featured Webinar

Featured White Paper

Featured White Paper

Featured Webinar

Featured Webinar

Featured White Paper

Featured White Paper

Featured Webinar

Featured eBook

Featured Webinar

Featured White Paper

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Free Trial

Featured White Paper

Featured Free Trial

Featured Webinar

Featured White Paper

Featured White Paper

Featured eBook

Featured Webinar

Featured eBook

Featured White Paper

Featured Webinar

Featured eBook

Featured Free Trial

Featured Free Tool

Featured White Paper

Featured Free Trial

Featured Webinar

Featured White Paper

Featured Free Trial