
Grafana Lab announced new enhancements to help address the challenges of Kubernetes monitoring and provided updates on its contributions to the open source ecosystem.
Kubernetes Monitoring in Grafana Cloud makes it easier to monitor Kubernetes clusters by allowing users to visualize and analyze key metrics related to Kubernetes environments, track resource usage, and gain insights into the behavior of their applications within Kubernetes. New updates offer seamless setup and deployment, quick and easy issue management, fine-grained cost monitoring, and more:
- Deploy and scale easier: Users no longer have to use Grafana Agent or Grafana Agent Operator to manually configure infrastructure data. With the new Grafana Kubernetes Monitoring Helm chart, it’s easier to send metrics, logs, events, traces, and cost metrics to Grafana Cloud. Plus, you can customize the chart for your specific needs. In addition, IBM Cloud is now a configurable option as part of the out-of-the-box Helm chart.
- Respond to alerts quicker: You can now respond to and troubleshoot Kubernetes alerts without leaving the context of Grafana Kubernetes Monitoring. You can start your troubleshooting either through the "Pods in trouble" section on the home page or the Alerts page. The updated Alerts page provides a centralized location to view all alerts related to your Kubernetes infrastructure and the applications running within it. From here, you can see graphs showing alerts by cluster and namespace, as well as by alert severity, making it easier to filter and drill down into issues to resolve them more quickly.
- Get a 360° view of your infrastructure data: The Kubernetes monitoring interface now provides a comprehensive overview and detailed analysis of your cluster's health and performance. The main page offers a snapshot of critical issues, displaying graphs for Clusters, Nodes, Pods, and containers, regardless of your cloud provider or Kubernetes distribution. In addition, the new time picker facilitates historical data analysis, allowing for the examination of resource usage over selected time frames, which is crucial for addressing inefficiencies and managing costs.
- Monitor, predict, and optimize resource usage: The New Summary views for Clusters, Nodes, Workloads and Namespaces now correlate CPU, memory, and storage usage, aiding in performance troubleshooting and identifying underutilized resources. Predictive features, enabled by the Machine Learning plugin, provide forecasts for CPU and memory usage to optimize resource allocation in every component insights view. Together, these features enable a robust approach to managing Kubernetes clusters, ensuring efficient resource use and cost-effectiveness.
- Keep on top of Kubernetes costs: Kubernetes Monitoring provides cost monitoring tools that allow you to correlate resource management to cost attribution. With the new Pod and Container detail pages, users can now see a breakdown of costs on a per-container or per-pod basis.
- Bring your own tools: Kubernetes Monitoring is now listed in the AWS marketplace as an EKS add-on. In addition, ClickHouse, InfluxDB and Presto integrations are now available to use with Kubernetes Monitoring and provide out-of-the-box dashboards, alerts and recording rules for an easy services observability start.
Additional Kubernetes Updates
- Robust Application Observability integration: Monitor the application layer running on your Kubernetes infrastructure at the pod and workload level. Identify root causes quickly by waving off the complexity of correlating infrastructure health with application performance. Never lose context and easily navigate between Application Observability and Kubernetes Monitoring apps in Grafana Cloud.
- Kubernetes support in Beyla: Grafana Labs’ 2024 Observability Survey found that eBPF is one of the technologies respondents are most excited about. Now, with Grafana Labs’ open source eBPF-based auto-instrumentation tool Beyla adding full Kubernetes support, users can incorporate Kubernetes metadata into the telemetry it generates, allowing for grouping and filtering by deployment, namespace, cluster, and other parameters. With this update, the Grafana Beyla configuration now “understands” Kubernetes semantics to provide a more fine-grained selection of services to instrument.
- Grafana Operator migration: Grafana Operator, the open source Kubernetes operator that helps you manage your Grafana instances within and outside of Kubernetes, will now officially be managed by Grafana Labs. Moving the operator under the Grafana Labs umbrella will help ensure seamless compatibility with Grafana Cloud, foster a more focused and collaborative community effort, drive better documentation, and keep up with cutting-edge feature development.
“Grafana Labs’ history with OpenTelemetry can be traced back to its predecessor projects, OpenCensus and OpenTracing. However, our investment in the project has only increased over time as we identify areas that make sense for our users,” said Juraci Paixão Kröhling, Principal Software Engineer at Grafana Labs and OpenTelemetry Governing Board Member. “With metrics, and more recently logging, marked as stable in OpenTelemetry, we’re seeing the project gain momentum among Grafana users as more people are coupling it with Prometheus as the backend. Because we want to be where our users are, our goal is to continue to make it easier for them to get value from OpenTelemetry.”
The Latest
In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...
Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...
In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ...
Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...
Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...
Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...
The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...
The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...
In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...
AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.