How do we organize log data in a meaningful way that will not only make sense, but also be practical, usable, visible, and accessible quickly; in addition to being organized to support DevOps and APM insights?
Despite numerous log data analysis deployments, we still identify many challenges users face regarding IT log data visualization, analysis, and insights. How can we make sure anomaly detection is fast and easy so that log management does not become too time-consuming? Here are some guidelines for building meaningful operational views and dashboards for IT, leveraging log search, log analysis, machine learning, and advanced analytics.
First Ask Questions
Although stating the obvious, before investing expensive efforts and resources into analyzing data, it is crucial to define your expectations and requirements. While in the past, merely collecting all log data and making it available for search was good enough, this is no longer the case.
In order to ask the right questions, determine what the most important use cases your log data has shown you and what role you want your log data to play in your future ongoing work. To do this, you must monitor system availability, software quality, continuous deployment, application performance, and business insights, troubleshoot, analyze security incidents, compliance audit etc.
There are specific use cases for the application life cycle. Architect, developer, tester, DevOps, APM, operations, and production support all have specific uses cases and requirements. Giving the right answer to the right question makes a big impact and will drive smart actions.
Once the requirements and expectations are well defined, it is crucial to be able to visualize your findings for further analysis; the more detailed, the better. We recommend creating an App that contains a collection of dashboards. If possible, create a dashboard per topic or use case, and provide each one with a meaningful name (“performance”, “errors”, “user audit”).
Now create search queries, or use out of the box gadgets for analytics, to find example Apps that you will be able to use as examples of best use cases for log analysis data visualization.
How to Visualize
Once you’ve created search queries to analyze data and generate proper result sets, you will need to select the visualization gadget that best reads these result sets and visualizes it in the most effective way.
Here is a result set that aggregated and computed the avg. memory consumption and total memory usage of two application servers. Take a look at the figure below. On gadget 1 you can see the totals over 24 hr aggregated memory consumption at 1 hr intervals. This gadget tells the story of both servers. Gadgets 2 and 3 represent the same data but for each of the individual servers. Once we split the data for each server we discover that each of the servers had a very different memory consumption pattern.
An hourly aggregation for memory is far from being accurate; memory changes at a much faster rate. On the upper row of gadgets we see the totals for both servers (gadget 4), and two additional gadgets, 5 and 6, representing each server in 1 min intervals.
We were looking to monitor our application server memory consumption to avoid spikes that might crash one of our clusters. Choosing the right visualization tools, and in this case, intervals, makes a big difference.
Optimize your dashboards and visualization gadgets by verifying that they deliver the insights you’re after in the right resolution. In the example above, analyzing memory for the entire cluster did not provide a clear status image of the memory consumption, but grouping by server and later reducing the time interval resolution to minutes gave a clear understanding of which cluster spiked.
Once your Apps and Dashboards provide clear views and visualization, it becomes much easier to identify problems, trends, and insights on your IT and applications. Now you can monitor or view the dashboards live. Leverage the visibility and you will be able to take actions that will make your applications more agile, secure, and optimized for the business.
Ask More Questions
Go back to the first step. This is an ongoing process. Data changes every day. The content of logs and other data types is being updated by IT, developers, and vendors continuously. In order to stay ahead, keep asking questions and never stop looking for the answers.
Haim Koschitzky is CEO of XpoLog Ltd.
Achieving audit compliance within your IT ecosystem can be an iterative process, and it doesn't have to be compressed into the five days before the audit is due. Following is a four-step process I use to guide clients through the process of preparing for and successfully completing IT audits ...
Network performance issues come in all shapes and sizes, and can require vast amounts of time and resources to solve. Here are three examples of painful network performance issues you're likely to encounter this year, and how NPMD solutions can help you overcome them ...
"Scale up" versus "scale out" doesn't just apply to hardware investments, it also has an impact on product features. "Scale up" promotes buying the feature set you think you need now, then adding "feature modules" and licenses as you discover additional feature requirements are needed. Often as networks grow in size they also grow in complexity ...
Network Packet Brokers play a critical role in gaining visibility into new complex networks. They deliver the packet data and information IT and security teams need to identify problems, recognize security issues, and ensure overall network performance. However, not all Packet Brokers are created equal when it comes to scalability. Simply "scaling up" your network infrastructure at every growth point is a more complex and more expensive endeavor over time. Let's explore three ways the "scale up" approach to infrastructure growth impedes NetOps and security professionals (and the business as a whole) ...
Loyal users are the key to your service desk's success. Happy users want to use your services and they recommend your services in the organization. It takes time and effort to exceed user expectations, but doing so means keeping the promises we make to our users and being careful not to do too much without careful consideration for what's best for the organization and users ...
What's the difference between user satisfaction and user loyalty? How can you measure whether your users are satisfied and will keep buying from you? How much effort should you make to offer your users the ultimate experience? If you're a service provider, what matters in the end is whether users will keep coming back to you and will stay loyal ...
What if I said that a 95% reduction in the amount of IT noise, 99% reduction in ticket volume and 99% L1 resolution rate are not only possible, but that some of the largest, most complex enterprises in the world see these metrics in their environments every day, thanks to Artificial Intelligence (AI) and Machine Learning (ML)? Would you dismiss that as belonging to the realm of science fiction? ...
Of those surveyed, 96% of organizations have a digital transformation strategy, with 57% approaching it as an enterprise-wide priority, with a clear emphasis on speed of business, costs, risk, and customer satisfaction, according to IDC’s Aligning IT Strategies and Business Expectations for Digital Transformation Success, sponsored by EasyVista ...
One of my ongoing areas of focus is analytics, AIOps, and the intersection with AI and machine learning more broadly. Within this space, sad to say, semantic confusion surrounding just what these terms mean echoes the confusions surrounding ITSM ...