Skip to main content

Grafana Labs Announces New Grafana Cloud Capabilities

Grafana Labs announced new Grafana Cloud capabilities designed for Kubernetes platform teams seeking to reduce cloud costs and gain more unified monitoring experiences across their entire cloud native infrastructure.

“Kubernetes leveled up platform engineering and redefined how global distributed teams could access shared infrastructure – but teams have to use multiple different platforms to cover the full breadth of cost monitoring, system health, incident management and related K8s infrastructure concerns,” said Tom Wilkie, Grafana Labs CTO and CNCF Governing Board member. “We believe that Grafana Cloud, our fully managed offering that makes it easier to get started with observability and includes a generous forever-free tier, gives platform teams more insight under one roof than any other observability tool for Kubernetes environments.”

With Kubernetes Monitoring, the solution introduced to the fully managed Grafana Cloud observability platform last year, users can automatically ship metrics to Grafana Cloud after installing the Grafana Agent into one or more Kubernetes clusters. Once this connection is made, Grafana Cloud users have out-of-the-box access to their Kubernetes metrics, logs, and events via prebuilt dashboards and alerts.

The latest updates include:

- Cost monitoring: This feature, which leverages the CNCF sandbox project OpenCost, allows platform teams to measure infrastructure spend on Kubernetes deployments – breaking down costs to nodes, persistent volumes, and load balancers – across multi-cloud environments. Cost monitoring shows your AWS, GCP, and Azure environment costs alongside suggestions for resource areas where you can optimize for savings, such as CPUs, RAM, and more. For more about tracking cloud costs, check out the KubeCon session Where's Your Money Going? The Beginner's Guide to Measuring Kubernetes Costs. Grafana Labs engineers Mark Poko and JuanJo Ciarlante will discuss Grafana Labs' journey toward cost observability and lessons learned in optimizing cloud spend.

- Out-of-the-Box Kubernetes Traces: Grafana Cloud is experimenting with adding the possibility of scraping traces for Kubernetes clusters. Data can be then sent to Grafana Tempo for visualization. Rather than jumping between different Kubernetes infrastructure components to find out “what happened” in complex incident resolution scenarios, Grafana Cloud would allow platform teams to trace specific Kubernetes events from start to finish with a simple agent install.

- Kubernetes Monitoring landing page: Grafana Cloud’s new Kubernetes Monitoring landing page further reduces context switching for platform teams by bringing all of the most pressing issues you might have in your Kubernetes infrastructure to the surface automatically, in a single, predefined view. From pods in trouble (either crashlooping or not starting correctly), to nodes that have memory or disk pressure, to persistent volumes above 90 percent capacity, Grafana Cloud’s Kubernetes Monitoring makes intelligent inferences that identify problem areas before they bring systems down.

- Simplified Helm installation: Grafana Cloud’s new Helm installation makes it easy to install the Kubernetes Monitoring solution and get started scraping Kubernetes metrics, logs, and traces. It’s open source, any platform team can run it with the Grafana Agent, and it ships with basic configurations for what you want it to include. Kubernetes Monitoring is compatible with ArgoCD, Prometheus, Terraform, OTel Collector, Windows Exporter, or Ansible.

- Easy monitoring of services running on your Kubernetes fleet: Kubernetes Monitoring in Grafana Cloud includes out-of-the-box integrations that come with prebuilt dashboards, rules, and alerts for Aerospike, Apache ActiveMQ, Cilium, CoreDNS, etcd, NGINX, GitLab, Apache Kafka, CockroachDB, Apache Cassandra, PostgreSQL, MySQL. Grafana Cloud has bundled all of these integrations into a single solution themed for various monitoring use cases. If you have an application running in Kubernetes, you can also see where your application lives within your Kubernetes fleet – whether on AWS, Google, Amazon, OpenShift, or any other common Kubernetes distributions.

Continued contributions to CNCF open source projects

- Deeper OpenTelemetry and Prometheus integrations: Grafana Labs is the only company leading in contributions to Prometheus and OpenTelemetry. One main area of focus has been interoperability between the two projects. Now that OpenTelemetry Metrics is stable, it has gained traction among users, and more people are coupling OTel with Prometheus as the backend. In the last year and a half, the Prometheus working group, which includes Grafana Labs' Goutham Veeramachaneni, has been improving the usability of Prometheus with OpenTelemetry, including adding native OTLP ingestion in Prometheus.

- Continuous profiling for OpenTelemetry: Grafana Labs Engineering Director Ryan Perry is working with the community to integrate continuous profiling into the OpenTelemetry project. At KubeCon, Perry’s session – A Tale of Two Flamegraphs: Unlocking Performance Insights in a Diverse Application Landscape – will trace the evolution of performance profiling as a key “fourth pillar” in observability (adding a new dimension beyond metrics, logs, and traces), and provide an update on the efforts of Grafana Labs and other OpenTelemetry contributors to enable optimizing applications across diverse programming languages and platforms.

The Latest

Most organizations approach OpenTelemetry as a collection of individual tools they need to assemble from scratch. This view misses the bigger picture. OpenTelemetry is a complete telemetry framework with composable components that address specific problems at different stages of organizational maturity. You start with what you need today and adopt additional pieces as your observability practices evolve ...

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

As discussions around AI "autonomous coworkers" accelerate, many industry projections assume that agents will soon operate alongside human staff in making decisions, taking actions, and managing tasks with minimal oversight. But a growing number of critics (including some of the developers building these systems) argue that the industry still has a long way to go to be able to treat AI agents like fully trusted teammates ...

Enterprise AI has entered a transformational phase where, according to Digitate's recently released survey, Agentic AI and the Future of Enterprise IT, companies are moving beyond traditional automation toward Agentic AI systems designed to reason, adapt, and collaborate alongside human teams ...

The numbers back this urgency up. A recent Zapier survey shows that 92% of enterprises now treat AI as a top priority. Leaders want it, and teams are clamoring for it. But if you look closer at the operations of these companies, you see a different picture. The rollout is slow. The results are often delayed. There's a disconnect between what leaders want and what their technical infrastructure can handle ...

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Basic uptime is no longer the gold standard. By 2026, network monitoring must do more than report status, it must explain performance in a hybrid-first world. Networks are no longer just static support systems; they are agile, distributed architectures that sit at the very heart of the customer experience and the business outcomes ... The following five trends represent the new standard for network health, providing a blueprint for teams to move from reactive troubleshooting to a proactive, integrated future ...

APMdigest's Predictions Series concludes with 2026 AI Predictions — industry experts offer predictions on how AI and related technologies will evolve and impact business in 2026. Part 5, the final installment, covers AI's impacts on IT teams ...

Grafana Labs Announces New Grafana Cloud Capabilities

Grafana Labs announced new Grafana Cloud capabilities designed for Kubernetes platform teams seeking to reduce cloud costs and gain more unified monitoring experiences across their entire cloud native infrastructure.

“Kubernetes leveled up platform engineering and redefined how global distributed teams could access shared infrastructure – but teams have to use multiple different platforms to cover the full breadth of cost monitoring, system health, incident management and related K8s infrastructure concerns,” said Tom Wilkie, Grafana Labs CTO and CNCF Governing Board member. “We believe that Grafana Cloud, our fully managed offering that makes it easier to get started with observability and includes a generous forever-free tier, gives platform teams more insight under one roof than any other observability tool for Kubernetes environments.”

With Kubernetes Monitoring, the solution introduced to the fully managed Grafana Cloud observability platform last year, users can automatically ship metrics to Grafana Cloud after installing the Grafana Agent into one or more Kubernetes clusters. Once this connection is made, Grafana Cloud users have out-of-the-box access to their Kubernetes metrics, logs, and events via prebuilt dashboards and alerts.

The latest updates include:

- Cost monitoring: This feature, which leverages the CNCF sandbox project OpenCost, allows platform teams to measure infrastructure spend on Kubernetes deployments – breaking down costs to nodes, persistent volumes, and load balancers – across multi-cloud environments. Cost monitoring shows your AWS, GCP, and Azure environment costs alongside suggestions for resource areas where you can optimize for savings, such as CPUs, RAM, and more. For more about tracking cloud costs, check out the KubeCon session Where's Your Money Going? The Beginner's Guide to Measuring Kubernetes Costs. Grafana Labs engineers Mark Poko and JuanJo Ciarlante will discuss Grafana Labs' journey toward cost observability and lessons learned in optimizing cloud spend.

- Out-of-the-Box Kubernetes Traces: Grafana Cloud is experimenting with adding the possibility of scraping traces for Kubernetes clusters. Data can be then sent to Grafana Tempo for visualization. Rather than jumping between different Kubernetes infrastructure components to find out “what happened” in complex incident resolution scenarios, Grafana Cloud would allow platform teams to trace specific Kubernetes events from start to finish with a simple agent install.

- Kubernetes Monitoring landing page: Grafana Cloud’s new Kubernetes Monitoring landing page further reduces context switching for platform teams by bringing all of the most pressing issues you might have in your Kubernetes infrastructure to the surface automatically, in a single, predefined view. From pods in trouble (either crashlooping or not starting correctly), to nodes that have memory or disk pressure, to persistent volumes above 90 percent capacity, Grafana Cloud’s Kubernetes Monitoring makes intelligent inferences that identify problem areas before they bring systems down.

- Simplified Helm installation: Grafana Cloud’s new Helm installation makes it easy to install the Kubernetes Monitoring solution and get started scraping Kubernetes metrics, logs, and traces. It’s open source, any platform team can run it with the Grafana Agent, and it ships with basic configurations for what you want it to include. Kubernetes Monitoring is compatible with ArgoCD, Prometheus, Terraform, OTel Collector, Windows Exporter, or Ansible.

- Easy monitoring of services running on your Kubernetes fleet: Kubernetes Monitoring in Grafana Cloud includes out-of-the-box integrations that come with prebuilt dashboards, rules, and alerts for Aerospike, Apache ActiveMQ, Cilium, CoreDNS, etcd, NGINX, GitLab, Apache Kafka, CockroachDB, Apache Cassandra, PostgreSQL, MySQL. Grafana Cloud has bundled all of these integrations into a single solution themed for various monitoring use cases. If you have an application running in Kubernetes, you can also see where your application lives within your Kubernetes fleet – whether on AWS, Google, Amazon, OpenShift, or any other common Kubernetes distributions.

Continued contributions to CNCF open source projects

- Deeper OpenTelemetry and Prometheus integrations: Grafana Labs is the only company leading in contributions to Prometheus and OpenTelemetry. One main area of focus has been interoperability between the two projects. Now that OpenTelemetry Metrics is stable, it has gained traction among users, and more people are coupling OTel with Prometheus as the backend. In the last year and a half, the Prometheus working group, which includes Grafana Labs' Goutham Veeramachaneni, has been improving the usability of Prometheus with OpenTelemetry, including adding native OTLP ingestion in Prometheus.

- Continuous profiling for OpenTelemetry: Grafana Labs Engineering Director Ryan Perry is working with the community to integrate continuous profiling into the OpenTelemetry project. At KubeCon, Perry’s session – A Tale of Two Flamegraphs: Unlocking Performance Insights in a Diverse Application Landscape – will trace the evolution of performance profiling as a key “fourth pillar” in observability (adding a new dimension beyond metrics, logs, and traces), and provide an update on the efforts of Grafana Labs and other OpenTelemetry contributors to enable optimizing applications across diverse programming languages and platforms.

The Latest

Most organizations approach OpenTelemetry as a collection of individual tools they need to assemble from scratch. This view misses the bigger picture. OpenTelemetry is a complete telemetry framework with composable components that address specific problems at different stages of organizational maturity. You start with what you need today and adopt additional pieces as your observability practices evolve ...

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

As discussions around AI "autonomous coworkers" accelerate, many industry projections assume that agents will soon operate alongside human staff in making decisions, taking actions, and managing tasks with minimal oversight. But a growing number of critics (including some of the developers building these systems) argue that the industry still has a long way to go to be able to treat AI agents like fully trusted teammates ...

Enterprise AI has entered a transformational phase where, according to Digitate's recently released survey, Agentic AI and the Future of Enterprise IT, companies are moving beyond traditional automation toward Agentic AI systems designed to reason, adapt, and collaborate alongside human teams ...

The numbers back this urgency up. A recent Zapier survey shows that 92% of enterprises now treat AI as a top priority. Leaders want it, and teams are clamoring for it. But if you look closer at the operations of these companies, you see a different picture. The rollout is slow. The results are often delayed. There's a disconnect between what leaders want and what their technical infrastructure can handle ...

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Basic uptime is no longer the gold standard. By 2026, network monitoring must do more than report status, it must explain performance in a hybrid-first world. Networks are no longer just static support systems; they are agile, distributed architectures that sit at the very heart of the customer experience and the business outcomes ... The following five trends represent the new standard for network health, providing a blueprint for teams to move from reactive troubleshooting to a proactive, integrated future ...

APMdigest's Predictions Series concludes with 2026 AI Predictions — industry experts offer predictions on how AI and related technologies will evolve and impact business in 2026. Part 5, the final installment, covers AI's impacts on IT teams ...