
Grafana Labs announced new Grafana Cloud capabilities designed for Kubernetes platform teams seeking to reduce cloud costs and gain more unified monitoring experiences across their entire cloud native infrastructure.
“Kubernetes leveled up platform engineering and redefined how global distributed teams could access shared infrastructure – but teams have to use multiple different platforms to cover the full breadth of cost monitoring, system health, incident management and related K8s infrastructure concerns,” said Tom Wilkie, Grafana Labs CTO and CNCF Governing Board member. “We believe that Grafana Cloud, our fully managed offering that makes it easier to get started with observability and includes a generous forever-free tier, gives platform teams more insight under one roof than any other observability tool for Kubernetes environments.”
With Kubernetes Monitoring, the solution introduced to the fully managed Grafana Cloud observability platform last year, users can automatically ship metrics to Grafana Cloud after installing the Grafana Agent into one or more Kubernetes clusters. Once this connection is made, Grafana Cloud users have out-of-the-box access to their Kubernetes metrics, logs, and events via prebuilt dashboards and alerts.
The latest updates include:
- Cost monitoring: This feature, which leverages the CNCF sandbox project OpenCost, allows platform teams to measure infrastructure spend on Kubernetes deployments – breaking down costs to nodes, persistent volumes, and load balancers – across multi-cloud environments. Cost monitoring shows your AWS, GCP, and Azure environment costs alongside suggestions for resource areas where you can optimize for savings, such as CPUs, RAM, and more. For more about tracking cloud costs, check out the KubeCon session Where's Your Money Going? The Beginner's Guide to Measuring Kubernetes Costs. Grafana Labs engineers Mark Poko and JuanJo Ciarlante will discuss Grafana Labs' journey toward cost observability and lessons learned in optimizing cloud spend.
- Out-of-the-Box Kubernetes Traces: Grafana Cloud is experimenting with adding the possibility of scraping traces for Kubernetes clusters. Data can be then sent to Grafana Tempo for visualization. Rather than jumping between different Kubernetes infrastructure components to find out “what happened” in complex incident resolution scenarios, Grafana Cloud would allow platform teams to trace specific Kubernetes events from start to finish with a simple agent install.
- Kubernetes Monitoring landing page: Grafana Cloud’s new Kubernetes Monitoring landing page further reduces context switching for platform teams by bringing all of the most pressing issues you might have in your Kubernetes infrastructure to the surface automatically, in a single, predefined view. From pods in trouble (either crashlooping or not starting correctly), to nodes that have memory or disk pressure, to persistent volumes above 90 percent capacity, Grafana Cloud’s Kubernetes Monitoring makes intelligent inferences that identify problem areas before they bring systems down.
- Simplified Helm installation: Grafana Cloud’s new Helm installation makes it easy to install the Kubernetes Monitoring solution and get started scraping Kubernetes metrics, logs, and traces. It’s open source, any platform team can run it with the Grafana Agent, and it ships with basic configurations for what you want it to include. Kubernetes Monitoring is compatible with ArgoCD, Prometheus, Terraform, OTel Collector, Windows Exporter, or Ansible.
- Easy monitoring of services running on your Kubernetes fleet: Kubernetes Monitoring in Grafana Cloud includes out-of-the-box integrations that come with prebuilt dashboards, rules, and alerts for Aerospike, Apache ActiveMQ, Cilium, CoreDNS, etcd, NGINX, GitLab, Apache Kafka, CockroachDB, Apache Cassandra, PostgreSQL, MySQL. Grafana Cloud has bundled all of these integrations into a single solution themed for various monitoring use cases. If you have an application running in Kubernetes, you can also see where your application lives within your Kubernetes fleet – whether on AWS, Google, Amazon, OpenShift, or any other common Kubernetes distributions.
Continued contributions to CNCF open source projects
- Deeper OpenTelemetry and Prometheus integrations: Grafana Labs is the only company leading in contributions to Prometheus and OpenTelemetry. One main area of focus has been interoperability between the two projects. Now that OpenTelemetry Metrics is stable, it has gained traction among users, and more people are coupling OTel with Prometheus as the backend. In the last year and a half, the Prometheus working group, which includes Grafana Labs' Goutham Veeramachaneni, has been improving the usability of Prometheus with OpenTelemetry, including adding native OTLP ingestion in Prometheus.
- Continuous profiling for OpenTelemetry: Grafana Labs Engineering Director Ryan Perry is working with the community to integrate continuous profiling into the OpenTelemetry project. At KubeCon, Perry’s session – A Tale of Two Flamegraphs: Unlocking Performance Insights in a Diverse Application Landscape – will trace the evolution of performance profiling as a key “fourth pillar” in observability (adding a new dimension beyond metrics, logs, and traces), and provide an update on the efforts of Grafana Labs and other OpenTelemetry contributors to enable optimizing applications across diverse programming languages and platforms.
The Latest
A recent Rocket Software and Foundry study found that just 28% of organizations fully leverage their mainframe data, a concerning statistic given its critical role in powering AI models, predictive analytics, and informed decision-making ...
What kind of ROI is your organization seeing on its technology investments? If your answer is "it's complicated," you're not alone. According to a recent study conducted by Apptio ... there is a disconnect between enterprise technology spending and organizations' ability to measure the results ...
In today’s data and AI driven world, enterprises across industries are utilizing AI to invent new business models, reimagine business and achieve efficiency in operations. However, enterprises may face challenges like flawed or biased AI decisions, sensitive data breaches and rising regulatory risks ...
In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions....
There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...
In just a few months, Google will again head to Washington DC and meet with the government for a two-week remedy trial to cement the fate of what happens to Chrome and its search business in the face of ongoing antitrust court case(s). Or, Google may proactively decide to make changes, putting the power in its hands to outline a suitable remedy. Regardless of the outcome, one thing is sure: there will be far more implications for AI than just a shift in Google's Search business ...

In today's fast-paced digital world, Application Performance Monitoring (APM) is crucial for maintaining the health of an organization's digital ecosystem. However, the complexities of modern IT environments, including distributed architectures, hybrid clouds, and dynamic workloads, present significant challenges ... This blog explores the challenges of implementing application performance monitoring (APM) and offers strategies for overcoming them ...
Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...
IT infrastructure (on-premises, cloud, or hybrid) is becoming larger and more complex. IT management tools need data to drive better decision making and more process automation to complement manual intervention by IT staff. That is why smart organizations invest in the systems and strategies needed to make their IT infrastructure more resilient in the event of disruption, and why many are turning to application performance monitoring (APM) in conjunction with high availability (HA) clusters ...