Gremlin announced the launch of Reliability Intelligence — an AI-driven solution for analyzing and remediating reliability concerns in modern, complex systems.
Through a combination of automated fault injection experiments, continuous resilience analysis, and a Model Context Protocol (MCP) server for LLM integration, Gremlin's Reliability Intelligence decreases downtime and improves performance for online businesses.
"The Gremlin team has been managing complex online systems for decades – we know that you can't just throw LLMs at the hard engineering problems involved with building and maintaining business-critical systems," said Kolton Andrus, CEO of Gremlin. "Reliability Intelligence provides actionable recommendations based on a deep understanding of your systems architecture and its dependencies across various cloud providers and 3rd party services."
Highlights of Gremlin's Reliability Intelligence include:
- Experiment Analysis: While automated testing has been part of Gremlin for years, the analysis of results and comparison to expected behavior was left to engineers to perform manually. Experiment Analysis compares test results against expected behavior based on past performance, detects anomalous behavior during the test, and uncovers why a test fails.
- Recommended Remediation: By leveraging industry best practices and system behavior from millions of tests, Gremlin provides engineers with specific recommended actions after a failed test. These actions guide the user in resolving issues, which can include anything from adjusting code to fine-tuning observability alerts.
- MCP Server: Explore your data with Gremlin's MCP server integration. Connect your favorite LLM to query data, uncover insights, and create custom dashboards.
The Latest
APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026 ...
Industry experts offer predictions on how Cloud will evolve and impact business in 2026. Part 3 covers Multi, Hybrid and Private Cloud ...
Industry experts offer predictions on how Cloud will evolve and impact business in 2026. Part 2 covers FinOps, Sovereign Cloud and more ...
APMdigest's Predictions Series continues with 2026 Cloud Predictions — industry experts offer predictions on how Cloud will evolve and impact business in 2026. Part 1 covers AI's impact on cloud and cloud's impact on AI ...
Industry experts offer predictions on how NetOps and NPM will evolve and impact business in 2026. Part 2 covers NetOps challenges and the edge ...
APMdigest's Predictions Series continues with 2026 NetOps Predictions — industry experts offer predictions on how NetOps and Network Performance Management (NPM) will evolve and impact business in 2026 ...
In APMdigest's 2026 Observability Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 9 covers Observability of AI ...
In APMdigest's 2026 Observability Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 8 covers outages, downtime and availability ...
In APMdigest's 2026 Observability Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 7 covers Observability data ...
In APMdigest's 2026 Observability Predictions Series, industry experts offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 6 covers OpenTelemetry ...