Data downtime — periods of time when an organization's data is missing, wrong or otherwise inaccurate — nearly doubled year over year (1.89x), according to the State of Data Quality report from Monte Carlo.
The Wakefield Research survey, which was commissioned by Monte Carlo and polled 200 data professionals in March 2023, found that three critical factors contributed to this increase in data downtime. These factors included:
■ A rise in monthly data incidents, from 59 in 2022 to 67 in 2023.
■ 68% of respondents reported an average time of detection for data incidents of four hours or more, up from 62% of respondents in 2022.
■ A 166% increase in average time to resolution, rising to an average of 15 hours per incident across respondents.
More than half of respondents reported 25% or more of revenue was subjected to data quality issues. The average percentage of impacted revenue jumped to 31%, up from 26% in 2022. Additionally, an astounding 74% reported business stakeholders identify issues first, "all or most of the time," up from 47% in 2022.
These findings suggest data quality remains among the biggest problems facing data teams, with bad data having more severe repercussions on an organization's revenue and data trust than in years prior.
The survey also suggests data teams are making a tradeoff between data downtime and the amount of time spent on data quality as their datasets grow.
For instance, organizations with fewer tables reported spending less time on data quality than their peers with more tables, but their average time to detection and average time to resolution was comparatively higher. Conversely, organizations with more tables reported lower average time to detection and average time to resolution, but spent a greater percentage of their team's time to do so.
■ Respondents that spent more than 50% of their time on data quality had more tables (average 2,571) compared to respondents that spent less than 50% of their time on data quality (average 208).
■ Respondents that took less than 4 hours to detect an issue had more tables (average 1,269) than those who took longer than 4 hours to detect an issue (average 346).
■ Respondents that took less than 4 hours to resolve an issue had more tables (average 1,172) than those who took longer than 4 hours to resolve an issue (average 330).
"These results show teams having to make a lose-lose choice between spending too much time solving for data quality or suffering adverse consequences to their bottom line," said Barr Moses, CEO and co-founder of Monte Carlo. "In this economic climate, it's more urgent than ever for data leaders to turn this lose-lose into a win-win by leveraging data quality solutions that will lower BOTH the amount of time teams spend tackling data downtime and mitigating its consequences. As an industry, we need to prioritize data trust to optimize the potential of our data investments."
The survey revealed additional insights on the state of data quality management, including:
■ 50% of respondents reported data engineering is primarily responsible for data quality, compared to:
- 22% for data analysts
- 9% for software engineering
- 7% for data reliability engineering
- 6% for analytics engineering
- 5% for the data governance team
- 3% for non-technical business stakeholders
■ Respondents averaged 642 tables across their data lake, lakehouse, or warehouse environments.
■ Respondents reported having an average of 24 dbt models, and 41% reported having 25 or more dbt models.
■ Respondents averaged 290 manually-written tests across their data pipelines.
■ The number one reason for launching a data quality initiative was that the data organization identified data quality as a need (28%), followed by a migration or modernization of the data platform or systems (23%).
"Data testing remains data engineers' number one defense against data quality issues — and that's clearly not cutting it," said Lior Gavish, Monte Carlo CTO and Co-Founder. "Incidents fall through the cracks, stakeholders are the first to identify problems, and teams fall further behind. Leaning into more robust incident management processes and automated, ML-driven approaches like data observability is the future of data engineering at scale."
The Latest
Developers need a tool that can be portable and vendor agnostic, given the advent of microservices. It may be clear an issue is occurring; what may not be clear is if it's part of a distributed system or the app itself. Enter OpenTelemetry, commonly referred to as OTel, an open-source framework that provides a standardized way of collecting and exporting telemetry data (logs, metrics, and traces) from cloud-native software ...
As SLOs grow in popularity their usage is becoming more mature. For example, 82% of respondents intend to increase their use of SLOs, and 96% have mapped SLOs directly to their business operations or already have a plan to, according to The State of Service Level Objectives 2023 from Nobl9 ...
Observability has matured beyond its early adopter position and is now foundational for modern enterprises to achieve full visibility into today's complex technology environments, according to The State of Observability 2023, a report released by Splunk in collaboration with Enterprise Strategy Group ...
Before network engineers even begin the automation process, they tend to start with preconceived notions that oftentimes, if acted upon, can hinder the process. To prevent that from happening, it's important to identify and dispel a few common misconceptions currently out there and how networking teams can overcome them. So, let's address the three most common network automation myths ...
Many IT organizations apply AI/ML and AIOps technology across domains, correlating insights from the various layers of IT infrastructure and operations. However, Enterprise Management Associates (EMA) has observed significant interest in applying these AI technologies narrowly to network management, according to a new research report, titled AI-Driven Networks: Leveling Up Network Management with AI/ML and AIOps ...
When it comes to system outages, AIOps solutions with the right foundation can help reduce the blame game so the right teams can spend valuable time restoring the impacted services rather than improving their MTTI score (mean time to innocence). In fact, much of today's innovation around ChatGPT-style algorithms can be used to significantly improve the triage process and user experience ...
Gartner identified the top 10 data and analytics (D&A) trends for 2023 that can guide D&A leaders to create new sources of value by anticipating change and transforming extreme uncertainty into new business opportunities ...
The only way for companies to stay competitive is to modernize applications, yet there's no denying that bringing apps into the modern era can be challenging ... Let's look at a few ways to modernize applications and consider what new obstacles and opportunities 2023 presents ...
As online penetration grows, retailers' profits are shrinking — with the cost of serving customers anytime, anywhere, at any speed not bringing in enough topline growth to best monetize even existing investments in technology, systems, infrastructure, and people, let alone new investments, according to Digital-First Retail: Turning Profit Destruction into Customer and Shareholder Value, a new report from AlixPartners and World Retail Congress ...