Datadog announced the general availability of Data Jobs Monitoring, a new product that helps data platform teams and data engineers detect problematic Spark and Databricks jobs anywhere in their data pipelines, remediate failed and long-running-jobs faster, and optimize overprovisioned compute resources to reduce costs.
Data Jobs Monitoring immediately surfaces specific jobs that need optimization and reliability improvements while enabling teams to drill down into job execution traces so that they can correlate their job telemetry to their cloud infrastructure for fast debugging.
“When data pipelines fail, data quality is impacted, which can hurt stakeholder trust and slow down decision making. Long-running jobs can lead to spikes in cost, making it critical for teams to understand how to provision the optimal resources,” said Michael Whetten, VP of Product at Datadog. “Data Jobs Monitoring helps teams do just that by giving data platform engineers full visibility into their largest, most expensive jobs to help them improve data quality, optimize their pipelines and prioritize cost savings.”
Data Jobs Monitoring helps teams to:
- Detect job failures and latency spikes: Out-of-the-box alerts immediately notify teams when jobs have failed or are running beyond automatically detected baselines so that they can be addressed before there are negative impacts to the end user experience. Recommended filters surface the most important issues that are impacting job and cluster health, so that they can be prioritized.
- Pinpoint and resolve erroneous jobs faster: Detailed trace views show teams exactly where a job failed in its execution flow so they have the full context for faster troubleshooting. Multiple job runs can be compared to one another to expedite root cause analysis and identify trends and changes in run duration, Spark performance metrics, cluster utilization and configuration.
- Identify opportunities for cost savings: Resource utilization and Spark application metrics help teams identify ways to lower compute costs for overprovisioned clusters and optimize inefficient job runs.
Data Jobs Monitoring is now generally available.
The Latest
When employees encounter tech friction or feel frustrated with the tools they are asked to use, they will find a workaround. In fact, one in two office workers admit to using personal devices to log into work networks, with 32% of them revealing their employers are unaware of this practice, according to Securing the Digital Employee Experience ...
In today's high-stakes race to deliver innovative products without disruptions, the importance of feature management and experimentation has never been more clear. But what strategies are driving success, and which tools are truly moving the needle? ...
The demand for real-time AI capabilities is pushing data scientists to develop and manage infrastructure that can handle massive volumes of data in motion. This includes streaming data pipelines, edge computing, scalable cloud architecture, and data quality and governance. These new responsibilities require data scientists to expand their skill sets significantly ...
As the digital landscape constantly evolves, it's critical for businesses to stay ahead, especially when it comes to operating systems updates. A recent ControlUp study revealed that 82% of enterprise Windows endpoint devices have yet to migrate to Windows 11. With Microsoft's cutoff date on October 14, 2025, for Windows 10 support fast approaching, the urgency cannot be overstated ...
In Part 1 of this two-part series, I defined multi-CDN and explored how and why this approach is used by streaming services, e-commerce platforms, gaming companies and global enterprises for fast and reliable content delivery ... Now, in Part 2 of the series, I'll explore one of the biggest challenges of multi-CDN: observability.
CDNs consist of geographically distributed data centers with servers that cache and serve content close to end users to reduce latency and improve load times. Each data center is strategically placed so that digital signals can rapidly travel from one "point of presence" to the next, getting the digital signal to the viewer as fast as possible ... Multi-CDN refers to the strategy of utilizing multiple CDNs to deliver digital content across the internet ...
We surveyed IT professionals on their attitudes and practices regarding using Generative AI with databases. We asked how they are layering the technology in with their systems, where it's working the best for them, and what their concerns are ...
40% of generative AI (GenAI) solutions will be multimodal (text, image, audio and video) by 2027, up from 1% in 2023, according to Gartner ...
Today's digital business landscape evolves rapidly ... Among the areas primed for innovation, the long-standing ticket-based IT support model stands out as particularly outdated. Emerging as a game-changer, the concept of the "ticketless enterprise" promises to shift IT management from a reactive stance to a proactive approach ...