
For application performance monitoring (APM), many in IT tend to focus a significant amount of their time on the tool that performs the analysis. Unfortunately for them, the battle is won or lost at the data access level. If you don’t have the right data, you can’t fix the problem correctly.
This viewpoint is backed up by an APMdigest post back in August where Jim Frey cited some critical survey research. The research showed that "26% reported that their biggest challenge with incident response is that data exists, but they can’t access or analyze it easily." Key point – you need access to the right data at the right time to solve your problems.
This begs the question — how do I get the right data access?
The best source of data is from a network tap. A tap makes a complete copy of ALL the data passing through it. It is a passive device, so it does not alter any of the data and has a negligible effect on transmission time.
Taps are great because they are "set and forget." You simply plug the device into the network with a one-time disruption and you are done. No programming is required. Best of all, you can place taps anywhere in the network that you need data from — ingress, egress, remote offices, etc.
The one drawback to using taps is that if you install lots of them (which you will want to do), the amount of data feeds can overload the input ports to your APM tools. However, this issue is easily resolved by installing a network packet broker (NPB) to aggregate the data from the taps, filter the data as necessary, and then send that data on to the APM tool. This eliminates the overcrowding of the data ports on your APM tool.
An alternative to a tap is to use a mirroring port (also referred to as a SPAN port) off of your network switches. However, this is not recommended. One reason is that these ports are active devices, i.e. they can materially change data packet characteristics as the packets flow through the device. This is especially important when using data from these ports to diagnose problems.
In addition, bad packets (i.e. malformed packets) are dropped by the SPAN port. This ends up giving you a "digital view" of the situation, i.e. everything is fine and then there is a problem. Missing packets that could show degradation prior to data loss (which could have been useful to create a quicker diagnosis) is missing, along with any context as to what was happening before the problem began.
In the end, optimum data capture can be achieved using a tap and NPB. This results in a faster mean time to repair (MTTR).