Our digital businesses depend on the user experiences delivered by our application. Application Performance Management (APM) helps us proactively measure the end user experience, understand an applications flow through our architecture, and monitor the infrastructures that deliver the application: services, software, networks, servers and databases. And this has helped make us much better, yet we are still faced with many challenges.
Challenges for Most APM Solutions
Whether you are a b2c or b2b business, users are more discerning and the marketplace more competitive. This is driving our product development organizations to focus on innovation; building systems of engagement that function in an omni-channel world rather than systems of record. Agile and DevOps methodologies are more suited for this pace and tempo, but this means we are more frequently making changes to the application to improve user experience and business outcomes. With the increased pace of change, the formal testing cycles have shrunk or disappeared, and we need to depend on our production monitoring as real-time QA.
The increasingly real-time nature of business and operating environments combined with additional complexity in our application delivery supply chain brings challenges as well. Modern application architectures employ virtualization, cloud, CDNs, n-tier, 3rd party APIs and components, the public Internet and the multitude of different client devices we are now supporting. Whew! This is creating a Dashboard Overload situation for our operations people. There are too many dials too watch. A common way to deal with this is by choosing a few key metrics, and yet that means that we may not be paying attention to issues in the application delivery stack, which might be the canary in the coalmine.
Remember when 5-min or 15-min monitoring was sufficient? Not anymore! With results depending on our applications, performance is now measured in seconds. It’s no longer acceptable to have 15-seconds during a minute when something is slowing down service to users. Where we once were collecting metrics every 5-minutes we are now doing it every 30-seconds or more.
With growing complexity and urgency manually setting thresholds for a few key values is tedious and insufficient. Our environments are too dynamic and there are too many metrics to pay attention to. Even alerting based on historical baselines leaves us alternating between finding problems too late and creating a significant amount of false alarms.
One of the key challenges for monitoring solutions has been reducing the time it takes to perform problem domain isolation. Most are good at showing symptoms, but finding root cause can still be time consuming. Repairing the problem can’t begin until the root cause has been identified.
Even with mature APM solutions, developer, test and operations teams are strained by operational complexity, accelerated release schedules, and big data challenges to quickly find the root cause of issues affecting end user experience.
We need some power tools!
Analytics and Big Data Powered Anomaly Detection Resolve Many of the Challenges
The power of big data and data science can help us make the most of the vast cache of APM data we collect and help our DevOps teams supercharge user experience. It’s time to take some of the load off of our humans and let technology make it easier to focus on meaningful changes in user, application and system behavior. Analytics is becoming a valuable component of APM solutions because it’s adding value in so many ways.
For one, it’s automatic. Analytics can sift through vast amounts of data from IT operations, application performance and user experience in near real-time. Analytics understand normal behavior at every level of the delivery stack. This makes thresholds unnecessary and somewhat arcane. Isn’t that what we really want from intelligent monitoring? For it to tell us when something is developing that is not normal.
Analytics provide more reliable alerting going far beyond old school thresholds and statistical baselines to multivariate analysis to more accurately understand and learn the relationships of the data. Analytics sees through the complexity to identify the root-cause of developing issues using the power of data science.
Big data powered anomaly detection identifies the root cause of issues in real time. Fast root cause analysis significantly compresses the time to recover from system issues. These new capabilities will likely shift pressure to the fix, test, deploy part of the recovery process.
Today’s businesses are re-inventing themselves around digital experiences as the forces of social, mobile, cloud and big data take the reigns with customer engagement. This means User Experience has to take center stage for the business and IT and our tools need to empower our teams to focus on the meaningful things.
Analytics powered anomaly detection and root cause analysis is a key safety system for your business and applications that it runs on. It’s a safety system for your User Experience just like traction control is a safety system on your car.
Ken Godskind is Chief Blogger and Analyst for APM Examiner .