The Butterfly Effect
Picture an environment where the failure of an application component brings a service to its knees. No big deal we might say; it's familiar territory when managing monolithic applications and the "fail one, fail all" issues symptomatic of a single logical executable.
But now imagine an environment where the failure of a component we didn't even know about and which might not even exist anymore brings your systems down. Now we're dealing with both the complex and the chaotic.
Welcome to the world of container and microservice monitoring.
Passing the Complexity Monkey
Microservices and containers are great for developers because they remove the fragility, scale and deployment issue associated with tightly-coupled application architectures. By decomposing apps into smaller independent services and supported by cloud and continuous delivery, microservice architectures allow developers to crank out code much faster; never having to wait for lengthy system rebuilds, redeploys and integration tests, or sweat on whether that one line code change might have introduced a memory leak and brought the system down.
So as a developer wouldn't you want to work with something that takes all your pain away? Of course you would, but there's a catch – that pain doesn't disappear, it moves elsewhere. With the shift from monolithic to microservice applications, other groups now have to support the "complexity monkey" and the whole new set of challengers it brings. And the new tech zookeepers, well – Site reliability engineers, DevOps practitioners and IT operations.
New Levels of Complexity
From an application monitoring perspective, microservices architecture that structures an application as a collection of loosely coupled or distinct services introduces a whole new level of complexity. First up, these architectures naturally increase the proliferation of software instances due to the decomposition of monolithic applications – but that's only the start.
Add containers to the mix, where individual services can be dynamically orchestrated to start, stop and remerge anywhere in the environment makes discovering and tracking the suckers extremely problematic. As one of my colleagues so aptly put it: monitoring microservices is like trying to track a hummingbird in a massive flock of, well, hummingbirds. A tough gig, made even more difficult due to the increased number of dependencies and any possible container identifiers, such as an IP address, are obscured by the host systems they run on.
The Impact on Application Performance Management
The transient nature of containerized applications introduces many challenges for APM teams, who need to understand how containers are performing and contributing to overall application performance.
Furthermore, teams need to gain precise visibility, not easily forthcoming from traditional tools where the denseness of microservice patterns render topology maps unintelligible. Add to this an exponential increase in data due to microservice interactions potentially spawning hundreds of transactions and the process of manually setting performance baselines becomes almost impossible. Especially, and not least because containerized microservices will often exhibit unknown behaviors where it is difficult to determine performance by analyzing individual components.
These issues dictate that a new model is needed for APM, requiring teams to assess efficacy of current solutions in addressing microservice and container monitoring challenges. Priority should be given to following areas:
■ Lite Deployments – traditional agent-based instrumentation can be unsuitable for certain styles and incur too much overhead. Then again, other situations will benefit from the enriched performance metrics agents deliver. Seek out solution methods that can incorporate both styles.
■ All-points Transaction Tracing – with containerized microservices, complexity shifts to the communication across microservices and it's here that many problems are likely to occur. Transactions become less bounded as in the case of monolithic systems and their component parts (front-end, middleware and database), rather, traversing many services via APIs. Therefore, transaction tracing becomes more important, especially if it supports end-to-end tracing of API performance related to specific microservice service transactions.
■ Statistical Methods and Analytics – even with the highest rates of data collection frequency, it's challenging to paint a true performance picture when containerized services fluctuate dynamically. It's why modern applications monitoring solutions incorporate proven statistical methods to establish variance intensities from patterns of performance behavior. In CA APM this is referred to as Differential Analysis, which unlike traditional static baselining is purpose built for dynamic microservice environments.
■ Assisted Triage Workflows – when triaging issues in containerized environments speed and clarity are essential – knowing the what, who (is impacted), when and why events occur. CA APM's assisted triage supports this by identifying the most meaningful events occurring across microservice environments together with contextualized information (stories). Rather than manually hunting information across complex topology maps after the fact, assisted triage acts of events (such anomalous conditions and variances detected by differential analysis), automatically gathering evidence and presenting via an analysis notebook. This speeds root-cause analysis and reduces alert fatigue.
Without doubt microservices and containers up the ante on application monitoring. At CA Technologies, we recognize this and continue to invest in delivering capabilities organizations need to accelerate the benefits of these dynamic architectures. While these capabilities manifest in our CA APM solution, we also recognize the importance of continued debate and discussion as these powerful platforms mature and evolve. To this end, we're delighted to sponsor a New Stack Container Monitoring and Management eBook that describes in great detail the aforementioned challenges together with new monitoring approaches and techniques needed to future proof these technologies.