Skip to main content

Checkmk 2.4 Integrates OpenTelemetry and Synthetic Testing

Checkmk today announced the release of version 2.4, introducing powerful new features  designed to enhance full-stack monitoring. With integrated OpenTelemetry metrics and synthetic testing, IT teams gain end-to-end visibility across all layers of the IT stack — from infrastructure and applications to end-user experience. These capabilities enable faster, more proactive issue resolution and significantly reduce mean time to resolution (MTTR). Version 2.4 also introduces features such as quick setup for cloud workload monitoring and a redesigned Notification Hub, which reduce administrative overhead and lighten the load on overburdened IT teams. With this release, Checkmk addresses two of the biggest challenges in modern IT: rising system complexity and the ongoing shortage of skilled professionals.

Checkmk’s integration of OpenTelemetry allows IT teams to look inside their applications and monitor performance, availability, and potential failure points—right from the application code, all within a single platform. The built-in OpenTelemetry collector ingests data directly or via Prometheus endpoints, translates it into actionable metrics, and maps them to the relevant hosts. This provides clear visibility into not just what is failing, but where and why, enabling faster root cause analysis and targeted fixes — even for previously unidentified issues.

Checkmk 2.4 also introduces enhanced synthetic monitoring capabilities, making it easier for teams to create tests that simulate user behavior and assess availability, performance, and functionality from the end-user perspective. First introduced in version 2.3, synthetic testing is now fully integrated into the Checkmk interface. Test robots that simulate user behavior can be uploaded via the web UI and centrally configured and managed. These managed robots can be cloned, customized, and automatically deployed to Linux or Windows test nodes using the Checkmk Agent Bakery. New features also support synthetic testing in isolated offline environments, and KPI monitoring allows teams to track and analyze individual process steps within each test.

Checkmk 2.4 introduces a range of enhancements that boost usability, increase automation, and improve efficiency — while reducing administrative overhead. 

Highlights include:

  • Quick Setup: Cloud monitoring in minutes - Checkmk’s new Quick Setup feature streamlines and accelerates cloud monitoring configuration across AWS, Azure, and GCP. A guided, step-by-step process handles complex setup tasks in the background and verifies system connections, enabling administrators to achieve full cloud visibility quickly and reliably.
  • Notification Hub: Simplified alert configuration - The new Notification Hub streamlines the configuration, management, and fine-tuning of alerting workflows through an intuitive interface and improved user guidance. Key settings are centralized and accessible with just a few clicks, while real-time status messages and troubleshooting tips help users stay informed and respond quickly. Usability features such as search, slide-outs, and drop-down menus make setup more efficient. A newly added guided mode walks beginners through the configuration process step-by-step — saving time and reducing the risk of misconfiguration.
  • Dynamic host management: Automated control of Kubernetes clusters - In dynamic environments like Kubernetes or virtualized systems, hosts are constantly being created and removed. Checkmk detects these changes in real time, automatically adds new hosts to the monitoring system, and reliably removes those that no longer exist. Designed for maximum scalability, the dynamic host management feature ensures stable, high-performance monitoring — even with hundreds of changes per minute.

The Latest

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Checkmk 2.4 Integrates OpenTelemetry and Synthetic Testing

Checkmk today announced the release of version 2.4, introducing powerful new features  designed to enhance full-stack monitoring. With integrated OpenTelemetry metrics and synthetic testing, IT teams gain end-to-end visibility across all layers of the IT stack — from infrastructure and applications to end-user experience. These capabilities enable faster, more proactive issue resolution and significantly reduce mean time to resolution (MTTR). Version 2.4 also introduces features such as quick setup for cloud workload monitoring and a redesigned Notification Hub, which reduce administrative overhead and lighten the load on overburdened IT teams. With this release, Checkmk addresses two of the biggest challenges in modern IT: rising system complexity and the ongoing shortage of skilled professionals.

Checkmk’s integration of OpenTelemetry allows IT teams to look inside their applications and monitor performance, availability, and potential failure points—right from the application code, all within a single platform. The built-in OpenTelemetry collector ingests data directly or via Prometheus endpoints, translates it into actionable metrics, and maps them to the relevant hosts. This provides clear visibility into not just what is failing, but where and why, enabling faster root cause analysis and targeted fixes — even for previously unidentified issues.

Checkmk 2.4 also introduces enhanced synthetic monitoring capabilities, making it easier for teams to create tests that simulate user behavior and assess availability, performance, and functionality from the end-user perspective. First introduced in version 2.3, synthetic testing is now fully integrated into the Checkmk interface. Test robots that simulate user behavior can be uploaded via the web UI and centrally configured and managed. These managed robots can be cloned, customized, and automatically deployed to Linux or Windows test nodes using the Checkmk Agent Bakery. New features also support synthetic testing in isolated offline environments, and KPI monitoring allows teams to track and analyze individual process steps within each test.

Checkmk 2.4 introduces a range of enhancements that boost usability, increase automation, and improve efficiency — while reducing administrative overhead. 

Highlights include:

  • Quick Setup: Cloud monitoring in minutes - Checkmk’s new Quick Setup feature streamlines and accelerates cloud monitoring configuration across AWS, Azure, and GCP. A guided, step-by-step process handles complex setup tasks in the background and verifies system connections, enabling administrators to achieve full cloud visibility quickly and reliably.
  • Notification Hub: Simplified alert configuration - The new Notification Hub streamlines the configuration, management, and fine-tuning of alerting workflows through an intuitive interface and improved user guidance. Key settings are centralized and accessible with just a few clicks, while real-time status messages and troubleshooting tips help users stay informed and respond quickly. Usability features such as search, slide-outs, and drop-down menus make setup more efficient. A newly added guided mode walks beginners through the configuration process step-by-step — saving time and reducing the risk of misconfiguration.
  • Dynamic host management: Automated control of Kubernetes clusters - In dynamic environments like Kubernetes or virtualized systems, hosts are constantly being created and removed. Checkmk detects these changes in real time, automatically adds new hosts to the monitoring system, and reliably removes those that no longer exist. Designed for maximum scalability, the dynamic host management feature ensures stable, high-performance monitoring — even with hundreds of changes per minute.

The Latest

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...