
Grafana Labs announced new features tailored for Kubernetes platform teams using Grafana Cloud.
Kubernetes Monitoring is a solution for proactive monitoring and troubleshooting within the fully managed Grafana Cloud observability platform. It provides a single pane of glass across your Kubernetes infrastructure, offering out-of-the-box visualizations of user data, cost monitoring insights, preconfigured alerts, alert rules, recording rules, and AI/ML predictions to simplify tasks that have been historically complex and time-consuming to achieve with just kubectl.
The solution’s real-time visibility, alerting, and troubleshooting capabilities for Kubernetes environments help organizations like Beeswax efficiently manage and optimize their container-based infrastructure. James Wojewoda, Lead Site Reliability Engineer at Beeswax, said, “Kubernetes Monitoring on Grafana Cloud enables our engineers to have native monitoring. No longer do they have to reach out to our SRE team. Instead, they just click a button on the Grafana Cloud integrations tab, navigate to the out-of-the-box dashboard, and see all the information — CPU usage, logs, metrics — they need to solve the problem themselves. It’s so simple, helps us spot issues fast, and saves us all a lot of custom development time.”
The latest updates include:
- Contextual root cause analysis for Kubernetes-based applications: At ObservabilityCON in September, Grafana Labs announced a suite of contextual root cause analysis workflows in Grafana Cloud. These workflows leverage AI/ML to automate root cause analysis in users’ Kubernetes environments, simplifying troubleshooting and reducing MTTR. Historically, diagnosing issues in complex Kubernetes-based applications has been a time-consuming task, requiring manual correlation of data from both application and infrastructure layers. By automatically identifying and correlating anomalies across these layers, this technology provides teams with contextualized insights, enabling faster and more accurate issue resolution.
- Availability in AWS Marketplace: Kubernetes Monitoring is now available in AWS Marketplace. Now, AWS users can easily deploy the solution with a single step in the AWS console or through the AWS Command Line Interface.
- Improved operational monitoring with Sift investigations: Kubernetes Monitoring also streamlines operational monitoring with Sift investigations by instantly identifying critical deployment issues. The platform’s intelligent alerting system automatically detects deployments with insufficient ready replicas and quickly pinpoints problematic deployments and pods requiring restart. This enhanced visibility not only reduces MTTR but also significantly improves overall system reliability.
- Enhanced historical visibility: While traditional kubectl commands offer limited historical data retention and require manual intervention, Kubernetes Monitoring in Grafana Cloud provides automatic, comprehensive historical tracking of pods, nodes, and clusters—even after deletion or recreation. This enables SREs to efficiently troubleshoot post-deployment issues, optimize resource allocation, and conduct thorough incident analyses without the complexity of maintaining additional infrastructure or orchestrating multiple tools.
- Troubleshooting enhancements: Kubernetes Monitoring has added more troubleshooting tools to help users easily find deleted objects, zoom in on a graph to narrow a time range for more analysis, and jump directly to Clusters, Nodes, and workloads directly from the homepage.
- New views: One of the most popular features within Kubernetes Monitoring, cost monitoring, now provides a 90-day view of total compute cost, average cost per Pod, and average Pod count in the Cost overview tab. In addition, Kubernetes Monitoring now allows users to view energy data for their Kubernetes infrastructure components.
In addition, Grafana Labs continues to lean into its big tent philosophy while strengthening its commitment to the open source Kubernetes community by leading the development of the Kubernetes Monitoring Helm chart—a powerful open source solution for collecting comprehensive telemetry data from Kubernetes clusters. The chart enables the collection of metrics, logs, traces, and profiles, with capabilities for local processing and flexible data routing to the backend database of your choosing. While optimized for Grafana Cloud, it integrates seamlessly with open source databases including Mimir, Loki, Tempo, Pyroscope, and more. The upcoming 2.0 release will bring many improvements including the ability to send data to multiple destinations, built-in service integrations, and a simplified configuration experience.
The Latest
Deloitte found that 74% of enterprises expect to deploy agentic AI solutions in the next 24 months. However, the rush to deployment is outpacing foundational work, though. Only 21% of enterprises have fully formed agent governance models in place. The result? AI agents deployed without guidance or governance begin to function as fragmented islands of complexity ...
Cloud spending is no longer viewed as a passthrough IT expense, but as a strategic financial lever that directly impacts innovation capacity, profitability and enterprise resilience, according to the CFO Cloud Cost Optimization Report from Azul ...
As AI moves from generating responses to performing actions, the need for trust increases exponentially. And as organizations enlist AI agents for increasingly sophisticated business processes, trust is going to be the single most important theme for spurring adoption. What can organizations do to build trustworthy AI agents? ...
I've spent a lot of time in the channel, and one thing I keep coming back to is this: a partner program is only as good as what it looks like in the field. Many programs look great on paper, but when a partner is in front of a customer navigating a complex hybrid environment or trying to make the case for AI-powered observability, the gap between what a vendor promises and what it actually delivers becomes very clear, very fast ...
Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...
For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...
Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...
Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...
For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...
New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...