Data Engineers Spend 2 Days Per Week Firefighting Bad Data
September 13, 2022
Share this

Data professionals are spending 40% of their time evaluating or checking data quality and that poor data quality impacts 26% of their companies' revenue, according to The State of Data Quality 2022, a report commissioned by Monte Carlo and conducted by Wakefield Research.

The survey found that 75% of participants take four or more hours to detect a data quality incident and about half said it takes an average of nine hours to resolve the issue once identified. Worse, 58% said the total number of incidents has increased somewhat or greatly over the past year, often as a result of more complex pipelines, bigger data teams, greater volumes of data, and other factors.

Today, the average organization experiences about 61 data-related incidents per month, each of which takes an average of 13 hours to identify and resolve. This adds up to an average of about 793 hours per month, per company.

However, 61 incidents only represents the number of incidents known to respondents.

"In the mid-2010s, organizations were shocked to learn that their data scientists were spending about 60% of their time just getting data ready for analysis," said Barr Moses, Monte Carlo CEO and co-founder. "Now, even with more mature data organizations and advanced stacks, data teams are still wasting 40% of their time troubleshooting data downtime. Not only is this wasting valuable engineering time, but it's also costing precious revenue and diverting attention away from initiatives that move the needle for the business. These results validate that data reliability is one of the biggest and most urgent problems facing today's data and analytics leaders."

Nearly half of respondent organizations measure data quality most often by the number of customer complaints their company receives, highlighting the ad hoc - and reputation damaging - nature of this important element of modern data strategy.

The Cost of Data Downtime

"Garbage in, garbage out" aptly describes the impact data quality has on data analytics and machine learning. If the data is unreliable, so are the insights derived from it.

In fact, on average, respondents said bad data impacts 26% of their revenue. This validates and supplements other industry studies that have uncovered the high cost of bad data. For example, Gartner estimates poor data quality costs organizations an average $12.9 million every year.

Nearly half said business stakeholders are impacted by issues the data team doesn't catch most of the time, or all the time.

In fact, according to the survey, respondents that conducted at least three different types of data tests for distribution, schema, volume, null or freshness anomalies at least once a week suffered fewer data incidents (46) on average than respondents with a less rigorous testing regime (61). However, testing alone was insufficient and stronger testing did not have a significant correlation with reducing the level of impact on revenue or stakeholders.

"Testing helps reduce data incidents, but no human being is capable of anticipating and writing a test for every way data pipelines can break. And if they could, it wouldn't be possible to scale across their always changing environment," said Lior Gavish, Monte Carlo CTO and co-founder. "Machine learning-powered anomaly monitoring and alerting through data observability can help teams close these coverage gaps and save data engineers' time."

Share this

The Latest

January 26, 2023

As enterprises work to implement or improve their observability practices, tool sprawl is a very real phenomenon ... Tool sprawl can and does happen all across the organization. In this post, though, we'll focus specifically on how and why observability efforts often result in tool sprawl, some of the possible negative consequences of that sprawl, and we'll offer some advice on how to reduce or even avoid sprawl ...

January 25, 2023

As companies generate more data across their network footprints, they need network observability tools to help find meaning in that data for better decision-making and problem solving. It seems many companies believe that adding more tools leads to better and faster insights ... And yet, observability tools aren't meeting many companies' needs. In fact, adding more tools introduces new challenges ...

January 24, 2023

Driven by the need to create scalable, faster, and more agile systems, businesses are adopting cloud native approaches. But cloud native environments also come with an explosion of data and complexity that makes it harder for businesses to detect and remediate issues before everything comes to a screeching halt. Observability, if done right, can make it easier to mitigate these challenges and remediate incidents before they become major customer-impacting problems ...

January 23, 2023

The spiraling cost of energy is forcing public cloud providers to raise their prices significantly. A recent report by Canalys predicted that public cloud prices will jump by around 20% in the US and more than 30% in Europe in 2023. These steep price increases will test the conventional wisdom that moving to the cloud is a cheap computing alternative ...

January 19, 2023

Despite strong interest over the past decade, the actual investment in DX has been recent. While 100% of enterprises are now engaged with DX in some way, most (77%) have begun their DX journey within the past two years. And most are early stage, with a fourth (24%) at the discussion stage and half (49%) currently transforming. Only 27% say they have finished their DX efforts ...

January 18, 2023

While most thought that distraction and motivation would be the main contributors to low productivity in a work-from-home environment, many organizations discovered that it was gaps in their IT systems that created some of the most significant challenges ...

January 17, 2023
The US aviation sector was struggling to return to normal following a nationwide ground stop imposed by Federal Aviation Administration (FAA) early Wednesday over a computer issue ...
January 13, 2023

APMdigest and leading IT research firm Enterprise Management Associates (EMA) are teaming up on the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 1, Dan Twing, President and COO of EMA, discusses Observability and Automation with Will Schoeppner, Research Director covering Application Performance Management and Business Intelligence at EMA ...

January 12, 2023

APMdigest is following up our list of 2023 Application Performance Management Predictions with predictions from industry experts about how the cloud will evolve in 2023 ...

January 11, 2023

As demand for digital services increases and distributed systems become more complex, organizations must collect and process a growing amount of observability data (logs, metrics, and traces). Site reliability engineers (SREs), developers, and security engineers use observability data to learn how their applications and environments are performing so they can successfully respond to issues and mitigate risk ...