With input from industry experts — both analysts and vendors — this 8-part blog series will explore what is driving the convergence of observability and security, the challenges and advantages, and how it may transform the IT landscape.
One reason why observability and security make a good pairing is that traditional telemetry signals — metrics, logs, and traces — are helpful to maintain both performance and security.
"The convergence of security and observability is happening throughout the observability landscape, and telemetry pipelines are enabling organizations to make that happen," explains Buddy Brewer, Chief Product Officer at Mezmo. "Security engineers, developers, and SREs use telemetry pipelines to access telemetry data effectively and efficiently. Many are also adopting standards like OpenTelemetry to ease their data ingestion woes and allow teams across the organization to use standardized data and break down silos."
Brewer cites a recent ESG report showing that metrics, logs, and traces account for 86% of application data by volume. He maintains that this data is essential for SecOps teams to understand what parts of an application are working properly, identify errors, and determine how to address those errors. The same report shows that 69% of SecOps teams regularly or continuously access data from these three sources.
"Traditional application performance signals help SecOps by serving as a proof point that you are watching for outlier issues, for example, you are able to see and flag when something doesn't look right in your system," says Jam Leomi, Lead Security Engineer at Honeycomb. "This outlier data is surfaced in real-time using observability tools and can serve as an early indicator that something malicious is going on."
"There are emerging use cases for issues such as Kubernetes security or CSPM, where there does seem to be a big advantage to adding security capabilities to the traditional three pillars of logs, metrics and traces for observability," says Asaf Yigal, CTO of Logz.io. "Whether you have ops-type teams that can act on that data themselves or use it as a better informed stream of data to channel to their dedicated security teams, the reality is that cloud apps and infrastructure are so complex and fast moving, security has to be part of the picture for everyone involved."
Leomi of Honeycomb adds that the convergence of tools can help distinguish between performance and security issues, saying, "While a lot of the data surfaced in observability tools can look like an average system bottleneck or performance issue, applying the security lens to it could bring to light potential indicators of a security event."
Colin Fallwell, Field CTO of Sumo Logic agrees, "Many security incidents impact operations. For example, one can expect serious performance degradation to occur in a DDOS attack. Telemetry like tracing and logging data is naturally going to carry header information from web requests, IP information, and much, much more. Metrics are the canaries in the coal mine and serve as an early warning that something is wrong or trending out of the norm. All this data is valuable to security use cases as well. Deep application visibility, and deviations from the norm on authentication, access, processing, and DB access are table stakes for operations and highly valuable to SecOps. Consider how valuable this data is to security teams when trying to understand the impact and blast radius of security events."
Performance signals provide technologists with a detailed look into the health of their applications — if there are any bottlenecks, the signals can help locate where it's occurring and why, Joe Byrne, VP of Technology Strategy and CTO Adviser at Cisco AppDynamics adds. "For SecOps teams, detecting potential security threats before an attack is crucial, so having real-time insight into applications' performance would benefit them. SecOps teams can leverage observability tools to determine if any performance delays are due to vulnerabilities or security threats, allowing them to take immediate action to achieve resolution."
Let's look at each type of performance signal individually.
Log analytics tools have been serving cybersecurity teams for years, says Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA). "Logs are a record of what happened on a device or piece of software. Real time analysis will point to ongoing security incidents and forensic analysis will help security teams reconstruct an incident."
Use the player or download the MP3 below to listen to EMA-APMdigest Podcast Episode 2 — Shamus McGillicuddy talks about Network Observability, the convergence of observability and security, and more.
Logs provide a detailed record of application behavior, and can be used for troubleshooting issues, identifying performance bottlenecks, and detecting security threats. These are the time-stamped records of events, notes Roger Floren, Principal Product Manager at Red Hat.
"It's all about the logs to some extent — it always has been and always will be," says Yigal from Logz.io. "Consider that the SIEM — the virtual nervous system of the modern security ecosystem, for decades now — is a centralized repository for security data, and its primary job has always been to consume and provide analysis on top of mountains of log data. And this is telemetry running the full gamut from ITOps logs to security data coming in from other purpose-built security tooling. So, there's that: you have to maintain visibility and analysis into your log data, and it's a foundational element of security practices.
Ajit Sancheti, GM, Falcon LogScale at CrowdStrike outlines the history: "DevOps, ITOps and SecOps teams need to be able to access different types of data for a variety of use cases, such as investigating threats, debugging network issues, maximizing application performance and much more. In the past, this meant that these individual teams would deploy siloed monitoring, SIEM and log management tools. Additionally, many of the log management tools on the market lacked the scale to centrally collect and store all logs and allow large numbers of users to simultaneously access and query this data."
"Today, organizations are finally able to log security and observability data in one place," Sancheti continues. "This is due to innovations like index-free logging architectures, which enable organizations to ingest a petabyte of data per day (or more)."
Chaim Mazal, Chief Security Officer at Gigamon says the challenge is that logging tools see things in hindsight, they do not detect threats in real time. It's only when log data and network-derived intelligence are integrated that SecOps teams can detect threats or performance issues in real-time before they harm or slow the business down.
"Once integrated and SecOps teams gain the deep observability required, they can shift toward a proactive security posture and ensure cloud security across their infrastructure whether it's located on-premises, in private clouds, in containers, or in the public cloud," Mazal adds.
Performance metrics can also be used to identify security events in some cases.
"Deep performance signals such as identifying a workload's performance through metrics including CPU usage, system calls, memory usage, etc. allows security customers to determine aberrations from normal behavior," says Prashant Prahlad, VP of Cloud Security Products at Datadog.
For example,metrics can help to identify a possible denial of service attack if an unexpected and dramatic spike in usage is seen, according to Kirsten Newcomer, Director, Cloud and DevSecOps Strategy at Red Hat.
Yigal from Logz.io adds, "We see massive value in helping organizations quickly translate their huge volumes of logs into more immediately useful metrics from the traditional IT ops side, saving both time and money. But there's also the notion of introducing more security content, creating and tracking more security-relevant trends, so we do see some organizations moving in this direction."
Some experts say the key observability signal that makes a difference for security is traces. Newcomer from Red Hat says traces provide data about how information is flowing through a system and can be used to visualize unexpected errors and events.
"Security staff have always been dealing with logs. Metrics are also helpful. Traces are a new kind of information that observability brings into the picture," explains Mike Loukides, VP of Emerging Tech Content at O'Reilly Media. "They let you ask detailed questions about what's happening in the application — the sorts of questions that could help you to spot a compromise early on."
"To take an overly simple example: any system that's online will see failed login attempts all the time. These will be in the logs, and they don't tell you much," he continues. "When a failed login attempt is followed by a successful login from the same IP address, that might tell you something — or it might be that an authorized user mistyped his password. That's about as far as logging will take you. But when that now-authorized user starts interacting with parts of the system that they shouldn't have access to, you know you have a real problem. You can ask questions like: How did they get in? When did they get in? And what did they do while they were in our system? And that's the kind of information that you're going to get from traces."
Prahlad from Datadog concludes, "The applications get instrumented with libraries for tracing and the exact same traces are used to detect attacks. In many cases SecOps detect these aberrations from the performance data and identify security issues much more quickly — all without additional instrumentation and performance overheads."
Navigating observability pricing models can be compared to solving a perplexing puzzle which includes financial variables and contractual intricacies. Predicting all potential costs in advance becomes an elusive endeavor, exemplified by a recent eye-popping $65 million observability bill ...
Generative AI may be a great tool for the enterprise to help drive further innovation and meaningful work, but it also runs the risk of generating massive amounts of spam that will counteract its intended benefits. From increased AI spam bots to data maintenance due to large volumes of outputs, enterprise AI applications can create a cascade of issues that end up detracting from productivity gains ...
A long-running study of DevOps practices ... suggests that any historical gains in MTTR reduction have now plateaued. For years now, the time it takes to restore services has stayed about the same: less than a day for high performers but up to a week for middle-tier teams and up to a month for laggards. The fact that progress is flat despite big investments in people, tools and automation is a cause for concern ...
Companies implementing observability benefit from increased operational efficiency, faster innovation, and better business outcomes overall, according to 2023 IT Trends Report: Lessons From Observability Leaders, a report from SolarWinds ...
Customer loyalty is changing as retailers get increasingly competitive. More than 75% of consumers say they would end business with a company after a single bad customer experience. This means that just one price discrepancy, inventory mishap or checkout issue in a physical or digital store, could have customers running out to the next store that can provide them with better service. Retailers must be able to predict business outages in advance, and act proactively before an incident occurs, impacting customer experience ...
Earlier this year, New Relic conducted a study on observability ... The 2023 Observability Forecast reveals observability's impact on the lives of technical professionals and businesses' bottom lines. Here are 10 key takeaways from the forecast ...
Only 33% of executives are "very confident" in their ability to operate in a public cloud environment, according to the 2023 State of CloudOps report from NetApp. This represents an increase from 2022 when only 21% reported feeling very confident ...