"Whatever doesn't kill you only makes you stronger. Except bears. Bears will kill you."
Old greeting card message
Like bears, there are many equivalents in technology that are ready to chomp all "The Revenant" style many a well-intentioned DevOps initiative. But paradoxically, what we crave to improve feedback loops and better inform decision making is the thing that also wreaks the most havoc – Data.
It's not that IT operations lacks data. Far from it. Teams consume on masses of information in order to keep business running. Application topology maps, metrics, run books, service desk incidents and knowledge based, plus CMDBs, logs and change requests – it's a long list that just keeps getting longer. Now, and thanks to mega-trends such as cloud-native everything, the Internet of Things, microservices and social media, the Big Data problem has got a whole lot bigger.
Data, Data, Everywhere, and Not a Drop to Drink
Data has always been a problem but up until recently it was more consumable. Classic mainframe and traditional 3-tier, on-premise hosted client server architectures meant operations had known quantities to manage. As such, it was common practice to organize and monitor by functional area, with separate teams using their own tools to capture and analyze data – be that application logic, middleware and database. True, this led to distinct and separate data silos, but that was an acceptable price to pay since most problems were narrow in scope and impact (think up or down). Furthermore, the internally-facing mega-beasts under management changed infrequently, with updates usually limited to new business functions and less to the rigid architectures that supported them.
But things get fuzzy in the age of cloud and microservice style development. The shackles of infrastructure rigidity may have been broken by a plethora of cloud providers offering flexible PaaS-style services, but the new emphasis on developing small, independent and reusable application components introduces massive complexity. Not only do operations have to deal with exponential increases in data volume and velocity, they must also address the thorny challenge of establishing service-levels and performance across an incomprehensible mesh of service elements. Now, traditional silo stewardship falls woefully short since any one seemingly innocuous condition within a swarm of API communicating cloud-centric services can have dire consequences. What's more, the fast pace change means teams no longer have the luxury of sedately piecing together disconnected data silos when triaging problems.
How to Manage App Performance in a Haystack of Needles
The big data monitoring problem facing IT operations is much more difficult than trying to find a needle in a static haystack. It's more like trying to find a needle in a swirling mass of needles. This requires a new model in the way monitoring data is gathered, stored, shared and analyzed.
In order to manage modern application architectures IT teams don't need more tools to collect more data, they need a new model in application performance management (APM). Modern APM solutions continuously observe and learn application behavior with minimal configuration. So as topologies change and developers introduce (or remove) components – be that code, container, microservice, API - whatever, the solution sees that change, activates monitoring and determines its impact on application performance. This is especially important in a world of continuous delivery where operations has less time for change reviews.
Behavioral observation also supports another key requirement – noise elimination. By watching application behaviors, modern application performance management can determine what constitutes "normal" performance. Using proven statistical models and analytics, these solutions set and adjust performance thresholds dynamically. This means support teams only get alerted when there are business impacting issues. The net-net – more focus and less costly on-call burn out chasing nuisance alarms and false-positives - benefits succinctly described by my colleague Kieran Taylor in his super article: Monitoring Microservices? Consider Noise Cancellation.
New Model APM Needs a New Data Model
Even with added visibility and noise reduction, problems with silo'd data management can still persist. It's why the very best application performance management solutions employ a unified data model by which diverse and previously separate data sources (from application to infrastructure) can be captured and layered onto application topologies. Rather than painstakingly hunt down infrastructure metrics such as CPU, disk and network throughput, a unified data model automatically and contextually surfaces information to deliver richer performance insights.
Consider for example a case where a network alarm indicates an increase in traffic latency. Modern tools and analytics can identify when and where this problem occurs (maybe even predict it), but because the data is managed in silos, application support teams have no indication which service are impacted. By using a unified data model to layer-in this condition, every team understands both the problem and its significance. Relationships and dependencies like these would be impossible to comprehend in dynamic application environments using traditional tools with separate data stores.
IT operations will continue to be swamped with data – nothing will change. What's different now is how data can be leveraged. Rather than focus on silo'd collection and using it to hold back change, modern teams, armed with modern Application Performance Management and a unified data model will exploit it to gain valuable business insights – like how many sales conversions result from improved API latency or which application code and cloud architecture correlate to the best business outcomes.
Bears will kill you, data shouldn't.