Skip to main content

Exploring the Convergence of Observability and Security - Part 2: Logs, Metrics and Traces

Pete Goldin
APMdigest

With input from industry experts — both analysts and vendors — this 8-part blog series will explore what is driving the convergence of observability and security, the challenges and advantages, and how it may transform the IT landscape.

Start with: Exploring the Convergence of Observability and Security - Part 1

One reason why observability and security make a good pairing is that traditional telemetry signals — metrics, logs, and traces — are helpful to maintain both performance and security.

"The convergence of security and observability is happening throughout the observability landscape, and telemetry pipelines are enabling organizations to make that happen," explains Buddy Brewer, Chief Product Officer at Mezmo. "Security engineers, developers, and SREs use telemetry pipelines to access telemetry data effectively and efficiently. Many are also adopting standards like OpenTelemetry to ease their data ingestion woes and allow teams across the organization to use standardized data and break down silos."

Brewer cites a recent ESG report showing that metrics, logs, and traces account for 86% of application data by volume. He maintains that this data is essential for SecOps teams to understand what parts of an application are working properly, identify errors, and determine how to address those errors. The same report shows that 69% of SecOps teams regularly or continuously access data from these three sources.

"Traditional application performance signals help SecOps by serving as a proof point that you are watching for outlier issues, for example, you are able to see and flag when something doesn't look right in your system," says Jam Leomi, Lead Security Engineer at Honeycomb. "This outlier data is surfaced in real-time using observability tools and can serve as an early indicator that something malicious is going on."

"There are emerging use cases for issues such as Kubernetes security or CSPM, where there does seem to be a big advantage to adding security capabilities to the traditional three pillars of logs, metrics and traces for observability," says Asaf Yigal, CTO of Logz.io. "Whether you have ops-type teams that can act on that data themselves or use it as a better informed stream of data to channel to their dedicated security teams, the reality is that cloud apps and infrastructure are so complex and fast moving, security has to be part of the picture for everyone involved."

Leomi of Honeycomb adds that the convergence of tools can help distinguish between performance and security issues, saying, "While a lot of the data surfaced in observability tools can look like an average system bottleneck or performance issue, applying the security lens to it could bring to light potential indicators of a security event."

Colin Fallwell, Field CTO of Sumo Logic agrees, "Many security incidents impact operations. For example, one can expect serious performance degradation to occur in a DDOS attack. Telemetry like tracing and logging data is naturally going to carry header information from web requests, IP information, and much, much more. Metrics are the canaries in the coal mine and serve as an early warning that something is wrong or trending out of the norm. All this data is valuable to security use cases as well. Deep application visibility, and deviations from the norm on authentication, access, processing, and DB access are table stakes for operations and highly valuable to SecOps. Consider how valuable this data is to security teams when trying to understand the impact and blast radius of security events."

Performance signals provide technologists with a detailed look into the health of their applications — if there are any bottlenecks, the signals can help locate where it's occurring and why, Joe Byrne, VP of Technology Strategy and CTO Adviser at Cisco AppDynamics adds. "For SecOps teams, detecting potential security threats before an attack is crucial, so having real-time insight into applications' performance would benefit them. SecOps teams can leverage observability tools to determine if any performance delays are due to vulnerabilities or security threats, allowing them to take immediate action to achieve resolution."

Let's look at each type of performance signal individually.

Logs

Log analytics tools have been serving cybersecurity teams for years, says Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA). "Logs are a record of what happened on a device or piece of software. Real time analysis will point to ongoing security incidents and forensic analysis will help security teams reconstruct an incident."

Use the player or download the MP3 below to listen to EMA-APMdigest Podcast Episode 2 — Shamus McGillicuddy talks about Network Observability, the convergence of observability and security, and more.

Click here for a direct MP3 download of Episode 2 - Part 1

Logs provide a detailed record of application behavior, and can be used for troubleshooting issues, identifying performance bottlenecks, and detecting security threats. These are the time-stamped records of events, notes Roger Floren, Principal Product Manager at Red Hat.

"It's all about the logs to some extent — it always has been and always will be," says Yigal from Logz.io. "Consider that the SIEM — the virtual nervous system of the modern security ecosystem, for decades now — is a centralized repository for security data, and its primary job has always been to consume and provide analysis on top of mountains of log data. And this is telemetry running the full gamut from ITOps logs to security data coming in from other purpose-built security tooling. So, there's that: you have to maintain visibility and analysis into your log data, and it's a foundational element of security practices.

Ajit Sancheti, GM, Falcon LogScale at CrowdStrike outlines the history: "DevOps, ITOps and SecOps teams need to be able to access different types of data for a variety of use cases, such as investigating threats, debugging network issues, maximizing application performance and much more. In the past, this meant that these individual teams would deploy siloed monitoring, SIEM and log management tools. Additionally, many of the log management tools on the market lacked the scale to centrally collect and store all logs and allow large numbers of users to simultaneously access and query this data."

"Today, organizations are finally able to log security and observability data in one place," Sancheti continues. "This is due to innovations like index-free logging architectures, which enable organizations to ingest a petabyte of data per day (or more)."

Chaim Mazal, Chief Security Officer at Gigamon says the challenge is that logging tools see things in hindsight, they do not detect threats in real time. It's only when log data and network-derived intelligence are integrated that SecOps teams can detect threats or performance issues in real-time before they harm or slow the business down.

"Once integrated and SecOps teams gain the deep observability required, they can shift toward a proactive security posture and ensure cloud security across their infrastructure whether it's located on-premises, in private clouds, in containers, or in the public cloud," Mazal adds.

Metrics

Performance metrics can also be used to identify security events in some cases.

"Deep performance signals such as identifying a workload's performance through metrics including CPU usage, system calls, memory usage, etc. allows security customers to determine aberrations from normal behavior," says Prashant Prahlad, VP of Cloud Security Products at Datadog.

For example,metrics can help to identify a possible denial of service attack if an unexpected and dramatic spike in usage is seen, according to Kirsten Newcomer, Director, Cloud and DevSecOps Strategy at Red Hat.

Yigal from Logz.io adds, "We see massive value in helping organizations quickly translate their huge volumes of logs into more immediately useful metrics from the traditional IT ops side, saving both time and money. But there's also the notion of introducing more security content, creating and tracking more security-relevant trends, so we do see some organizations moving in this direction."

Traces

Some experts say the key observability signal that makes a difference for security is traces. Newcomer from Red Hat says traces provide data about how information is flowing through a system and can be used to visualize unexpected errors and events.

"Security staff have always been dealing with logs. Metrics are also helpful. Traces are a new kind of information that observability brings into the picture," explains Mike Loukides, VP of Emerging Tech Content at O'Reilly Media. "They let you ask detailed questions about what's happening in the application — the sorts of questions that could help you to spot a compromise early on."

"To take an overly simple example: any system that's online will see failed login attempts all the time. These will be in the logs, and they don't tell you much," he continues. "When a failed login attempt is followed by a successful login from the same IP address, that might tell you something — or it might be that an authorized user mistyped his password. That's about as far as logging will take you. But when that now-authorized user starts interacting with parts of the system that they shouldn't have access to, you know you have a real problem. You can ask questions like: How did they get in? When did they get in? And what did they do while they were in our system? And that's the kind of information that you're going to get from traces."

Prahlad from Datadog concludes, "The applications get instrumented with libraries for tracing and the exact same traces are used to detect attacks. In many cases SecOps detect these aberrations from the performance data and identify security issues much more quickly — all without additional instrumentation and performance overheads."

Go to: Exploring the Convergence of Observability and Security - Part 3: Tools

Pete Goldin is Editor and Publisher of APMdigest

The Latest

Organizations that perform regular audits and assessments of AI system performance and compliance are over three times more likely to achieve high GenAI value than organizations that do not, according to a survey by Gartner ...

Kubernetes has become the backbone of cloud infrastructure, but it's also one of its biggest cost drivers. Recent research shows that 98% of senior IT leaders say Kubernetes now drives cloud spend, yet 91% still can't optimize it effectively. After years of adoption, most organizations have moved past discovery. They know container sprawl, idle resources and reactive scaling inflate costs. What they don't know is how to fix it ...

Artificial intelligence is no longer a future investment. It's already embedded in how we work — whether through copilots in productivity apps, real-time transcription tools in meetings, or machine learning models fueling analytics and personalization. But while enterprise adoption accelerates, there's one critical area many leaders have yet to examine: Can your network actually support AI at the speed your users expect? ...

The more technology businesses invest in, the more potential attack surfaces they have that can be exploited. Without the right continuity plans in place, the disruptions caused by these attacks can bring operations to a standstill and cause irreparable damage to an organization. It's essential to take the time now to ensure your business has the right tools, processes, and recovery initiatives in place to weather any type of IT disaster that comes up. Here are some effective strategies you can follow to achieve this ...

In today's fast-paced AI landscape, CIOs, IT leaders, and engineers are constantly challenged to manage increasingly complex and interconnected systems. The sheer scale and velocity of data generated by modern infrastructure can be overwhelming, making it difficult to maintain uptime, prevent outages, and create a seamless customer experience. This complexity is magnified by the industry's shift towards agentic AI ...

In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ... 

The explosion of generative AI and machine learning capabilities has fundamentally changed the conversation around cloud migration. It's no longer just about modernization or cost savings — it's about being able to compete in a market where AI is rapidly becoming table stakes. Companies that can't quickly spin up AI workloads, feed models with data at scale, or experiment with new capabilities are falling behind faster than ever before. But here's what I'm seeing: many organizations want to capitalize on AI, but they're stuck ...

On September 16, the world celebrated the 10th annual IT Pro Day, giving companies a chance to laud the professionals who serve as the backbone to almost every successful business across the globe. Despite the growing importance of their roles, many IT pros still work in the background and often go underappreciated ...

Artificial Intelligence (AI) is reshaping observability, and observability is becoming essential for AI. This is a two-way relationship that is increasingly relevant as enterprises scale generative AI ... This dual role makes AI and observability inseparable. In this blog, I cover more details of each side ...

Poor DEX directly costs global businesses an average of 470,000 hours per year, equivalent to around 226 full-time employees, according to a new report from Nexthink, Cracking the DEX Equation: The Annual Workplace Productivity Report. This indicates that digital friction is a vital and underreported element of the global productivity crisis ...

Exploring the Convergence of Observability and Security - Part 2: Logs, Metrics and Traces

Pete Goldin
APMdigest

With input from industry experts — both analysts and vendors — this 8-part blog series will explore what is driving the convergence of observability and security, the challenges and advantages, and how it may transform the IT landscape.

Start with: Exploring the Convergence of Observability and Security - Part 1

One reason why observability and security make a good pairing is that traditional telemetry signals — metrics, logs, and traces — are helpful to maintain both performance and security.

"The convergence of security and observability is happening throughout the observability landscape, and telemetry pipelines are enabling organizations to make that happen," explains Buddy Brewer, Chief Product Officer at Mezmo. "Security engineers, developers, and SREs use telemetry pipelines to access telemetry data effectively and efficiently. Many are also adopting standards like OpenTelemetry to ease their data ingestion woes and allow teams across the organization to use standardized data and break down silos."

Brewer cites a recent ESG report showing that metrics, logs, and traces account for 86% of application data by volume. He maintains that this data is essential for SecOps teams to understand what parts of an application are working properly, identify errors, and determine how to address those errors. The same report shows that 69% of SecOps teams regularly or continuously access data from these three sources.

"Traditional application performance signals help SecOps by serving as a proof point that you are watching for outlier issues, for example, you are able to see and flag when something doesn't look right in your system," says Jam Leomi, Lead Security Engineer at Honeycomb. "This outlier data is surfaced in real-time using observability tools and can serve as an early indicator that something malicious is going on."

"There are emerging use cases for issues such as Kubernetes security or CSPM, where there does seem to be a big advantage to adding security capabilities to the traditional three pillars of logs, metrics and traces for observability," says Asaf Yigal, CTO of Logz.io. "Whether you have ops-type teams that can act on that data themselves or use it as a better informed stream of data to channel to their dedicated security teams, the reality is that cloud apps and infrastructure are so complex and fast moving, security has to be part of the picture for everyone involved."

Leomi of Honeycomb adds that the convergence of tools can help distinguish between performance and security issues, saying, "While a lot of the data surfaced in observability tools can look like an average system bottleneck or performance issue, applying the security lens to it could bring to light potential indicators of a security event."

Colin Fallwell, Field CTO of Sumo Logic agrees, "Many security incidents impact operations. For example, one can expect serious performance degradation to occur in a DDOS attack. Telemetry like tracing and logging data is naturally going to carry header information from web requests, IP information, and much, much more. Metrics are the canaries in the coal mine and serve as an early warning that something is wrong or trending out of the norm. All this data is valuable to security use cases as well. Deep application visibility, and deviations from the norm on authentication, access, processing, and DB access are table stakes for operations and highly valuable to SecOps. Consider how valuable this data is to security teams when trying to understand the impact and blast radius of security events."

Performance signals provide technologists with a detailed look into the health of their applications — if there are any bottlenecks, the signals can help locate where it's occurring and why, Joe Byrne, VP of Technology Strategy and CTO Adviser at Cisco AppDynamics adds. "For SecOps teams, detecting potential security threats before an attack is crucial, so having real-time insight into applications' performance would benefit them. SecOps teams can leverage observability tools to determine if any performance delays are due to vulnerabilities or security threats, allowing them to take immediate action to achieve resolution."

Let's look at each type of performance signal individually.

Logs

Log analytics tools have been serving cybersecurity teams for years, says Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA). "Logs are a record of what happened on a device or piece of software. Real time analysis will point to ongoing security incidents and forensic analysis will help security teams reconstruct an incident."

Use the player or download the MP3 below to listen to EMA-APMdigest Podcast Episode 2 — Shamus McGillicuddy talks about Network Observability, the convergence of observability and security, and more.

Click here for a direct MP3 download of Episode 2 - Part 1

Logs provide a detailed record of application behavior, and can be used for troubleshooting issues, identifying performance bottlenecks, and detecting security threats. These are the time-stamped records of events, notes Roger Floren, Principal Product Manager at Red Hat.

"It's all about the logs to some extent — it always has been and always will be," says Yigal from Logz.io. "Consider that the SIEM — the virtual nervous system of the modern security ecosystem, for decades now — is a centralized repository for security data, and its primary job has always been to consume and provide analysis on top of mountains of log data. And this is telemetry running the full gamut from ITOps logs to security data coming in from other purpose-built security tooling. So, there's that: you have to maintain visibility and analysis into your log data, and it's a foundational element of security practices.

Ajit Sancheti, GM, Falcon LogScale at CrowdStrike outlines the history: "DevOps, ITOps and SecOps teams need to be able to access different types of data for a variety of use cases, such as investigating threats, debugging network issues, maximizing application performance and much more. In the past, this meant that these individual teams would deploy siloed monitoring, SIEM and log management tools. Additionally, many of the log management tools on the market lacked the scale to centrally collect and store all logs and allow large numbers of users to simultaneously access and query this data."

"Today, organizations are finally able to log security and observability data in one place," Sancheti continues. "This is due to innovations like index-free logging architectures, which enable organizations to ingest a petabyte of data per day (or more)."

Chaim Mazal, Chief Security Officer at Gigamon says the challenge is that logging tools see things in hindsight, they do not detect threats in real time. It's only when log data and network-derived intelligence are integrated that SecOps teams can detect threats or performance issues in real-time before they harm or slow the business down.

"Once integrated and SecOps teams gain the deep observability required, they can shift toward a proactive security posture and ensure cloud security across their infrastructure whether it's located on-premises, in private clouds, in containers, or in the public cloud," Mazal adds.

Metrics

Performance metrics can also be used to identify security events in some cases.

"Deep performance signals such as identifying a workload's performance through metrics including CPU usage, system calls, memory usage, etc. allows security customers to determine aberrations from normal behavior," says Prashant Prahlad, VP of Cloud Security Products at Datadog.

For example,metrics can help to identify a possible denial of service attack if an unexpected and dramatic spike in usage is seen, according to Kirsten Newcomer, Director, Cloud and DevSecOps Strategy at Red Hat.

Yigal from Logz.io adds, "We see massive value in helping organizations quickly translate their huge volumes of logs into more immediately useful metrics from the traditional IT ops side, saving both time and money. But there's also the notion of introducing more security content, creating and tracking more security-relevant trends, so we do see some organizations moving in this direction."

Traces

Some experts say the key observability signal that makes a difference for security is traces. Newcomer from Red Hat says traces provide data about how information is flowing through a system and can be used to visualize unexpected errors and events.

"Security staff have always been dealing with logs. Metrics are also helpful. Traces are a new kind of information that observability brings into the picture," explains Mike Loukides, VP of Emerging Tech Content at O'Reilly Media. "They let you ask detailed questions about what's happening in the application — the sorts of questions that could help you to spot a compromise early on."

"To take an overly simple example: any system that's online will see failed login attempts all the time. These will be in the logs, and they don't tell you much," he continues. "When a failed login attempt is followed by a successful login from the same IP address, that might tell you something — or it might be that an authorized user mistyped his password. That's about as far as logging will take you. But when that now-authorized user starts interacting with parts of the system that they shouldn't have access to, you know you have a real problem. You can ask questions like: How did they get in? When did they get in? And what did they do while they were in our system? And that's the kind of information that you're going to get from traces."

Prahlad from Datadog concludes, "The applications get instrumented with libraries for tracing and the exact same traces are used to detect attacks. In many cases SecOps detect these aberrations from the performance data and identify security issues much more quickly — all without additional instrumentation and performance overheads."

Go to: Exploring the Convergence of Observability and Security - Part 3: Tools

Pete Goldin is Editor and Publisher of APMdigest

The Latest

Organizations that perform regular audits and assessments of AI system performance and compliance are over three times more likely to achieve high GenAI value than organizations that do not, according to a survey by Gartner ...

Kubernetes has become the backbone of cloud infrastructure, but it's also one of its biggest cost drivers. Recent research shows that 98% of senior IT leaders say Kubernetes now drives cloud spend, yet 91% still can't optimize it effectively. After years of adoption, most organizations have moved past discovery. They know container sprawl, idle resources and reactive scaling inflate costs. What they don't know is how to fix it ...

Artificial intelligence is no longer a future investment. It's already embedded in how we work — whether through copilots in productivity apps, real-time transcription tools in meetings, or machine learning models fueling analytics and personalization. But while enterprise adoption accelerates, there's one critical area many leaders have yet to examine: Can your network actually support AI at the speed your users expect? ...

The more technology businesses invest in, the more potential attack surfaces they have that can be exploited. Without the right continuity plans in place, the disruptions caused by these attacks can bring operations to a standstill and cause irreparable damage to an organization. It's essential to take the time now to ensure your business has the right tools, processes, and recovery initiatives in place to weather any type of IT disaster that comes up. Here are some effective strategies you can follow to achieve this ...

In today's fast-paced AI landscape, CIOs, IT leaders, and engineers are constantly challenged to manage increasingly complex and interconnected systems. The sheer scale and velocity of data generated by modern infrastructure can be overwhelming, making it difficult to maintain uptime, prevent outages, and create a seamless customer experience. This complexity is magnified by the industry's shift towards agentic AI ...

In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ... 

The explosion of generative AI and machine learning capabilities has fundamentally changed the conversation around cloud migration. It's no longer just about modernization or cost savings — it's about being able to compete in a market where AI is rapidly becoming table stakes. Companies that can't quickly spin up AI workloads, feed models with data at scale, or experiment with new capabilities are falling behind faster than ever before. But here's what I'm seeing: many organizations want to capitalize on AI, but they're stuck ...

On September 16, the world celebrated the 10th annual IT Pro Day, giving companies a chance to laud the professionals who serve as the backbone to almost every successful business across the globe. Despite the growing importance of their roles, many IT pros still work in the background and often go underappreciated ...

Artificial Intelligence (AI) is reshaping observability, and observability is becoming essential for AI. This is a two-way relationship that is increasingly relevant as enterprises scale generative AI ... This dual role makes AI and observability inseparable. In this blog, I cover more details of each side ...

Poor DEX directly costs global businesses an average of 470,000 hours per year, equivalent to around 226 full-time employees, according to a new report from Nexthink, Cracking the DEX Equation: The Annual Workplace Productivity Report. This indicates that digital friction is a vital and underreported element of the global productivity crisis ...