If you have deployed a Java application in production, you've probably encountered a situation where the application suddenly starts to take up a large amount of CPU. When this happens, application response becomes sluggish and users begin to complain about slow response. Often the solution to this problem is to restart the application and, lo and behold, the problem goes away — only to reappear a few days later. A key question then is: how to troubleshoot high CPU usage of a Java application?
Why Do Java Applications Take High CPU?
Java applications may take high CPU resources for many reasons:
■ Poorly designed application code with inefficient loops: Recursive method calls, inefficient usage of collections (e.g., excessively large ArrayLists instead of say using HashMaps) can also be reasons for this.
■ Shortage of memory in the Java Virtual Machine (JVM) can also reflect in high CPU usage. Instead of spending time in processing, the JVM spends more time in Garbage Collection, which in turn takes up CPU cycles.
■ A JVM may max out on CPU usage because of the incoming workload. The server capacity may not be sized sufficiently to handle the rate of requests coming in and in such a situation, the Java application may be doing work, trying to keep up with the workload.
Restarting an application will not solve a CPU usage problem — it only mitigates the problem for a short while, until the problem reappears. It is, therefore, essential to identify the cause of the CPU spike: is it due to poorly designed application code, insufficient memory allocation, or an unexpectedly high workload?
JVM Monitoring Can Assist with Diagnosis of CPU Issues
Modern JVMs (1.5 and higher) support Java Management Instrumentation (JMX) APIs. According to Wikipedia, Java Management Extensions is a Java technology that supplies tools for managing and monitoring applications, system objects, devices and service-oriented networks. Those resources are represented by objects called MBeans (for Managed Bean). Managing and monitoring applications can be designed and developed using the Java Dynamic Management Kit.
Using JMX, Java monitoring tools can explore what threads are running in the JVM, the state of each thread, the CPU usage of each thread etc. By periodically collecting these statistics, monitoring tools can correlate thread level performance information with the CPU usage of the Java application and answer the question "Why is the Java application taking high CPU?"
Figure 1 below depicts the monitoring of threads in a JVM. High and medium CPU threads are defined as threads that take up more than 50% CPU and 30-50% CPU respectively. The existence of any high or medium CPU thread is indicative of an application bottleneck, i.e., a piece of inefficient code that is executing frequently and taking up CPU. In this example, there is one high CPU thread.
Figure 1: Diagnosing high CPU threads in the JVM
Detailed diagnosis of this metric reveals the stack trace — i.e., which line of code is the thread that is taking CPU executing. If the thread is assigned a name in the application, the thread name is shown on the left-hand side of Figure 2 and the detailed stack trace is on the right-hand side. This information gives operations staff and developers exactly what they need to identify the cause of high CPU usage. The exact class, method and line of code can be determined. In this example, look in the com.zapstore.logic.LogicBuilder class, createLogic method and line number 223.
Figure 2: Identifying the cause of high JVM CPU usage
If the CPU usage is due to an unexpected workload increase, you should see the number of threads increase, and even if each thread consumes a small amount of CPU, the aggregate may be significant.
If none of the application threads is taking much CPU, the aggregate CPU usage of the application threads is low and yet the Java application is taking CPU, suspect garbage collection activity in the JVM. You may want to change the garbage collection algorithm or increase the heap and non-heap memory available to the JVM to alleviate the problem.
Historical information captured about the JVM's CPU usage and individual threads' CPU usage can be used to determine what is the real cause of the Java application's high CPU usage. You will no longer need to restart the application and hope that the problem goes away. The historical insights (like shown below in Figure 3) will help you accurately determine the cause of CPU spikes and fix them, so you do not have to deal with the same issues ever again.
Figure 3: Historical JVM performance analytics and trends
Enabling JMX for a JVM has minimal impact on its performance. Hence, this technique of monitoring Java applications is applicable even for production environments.
Get 360° Visibility and Insights into Java Application Performance
The performance of Java applications depends on three critical factors: the JVM, the Java web container (WebLogic, JBoss, Tomcat, etc.), and the application transactions performed on the front end by the business user. The transactions are where the end user will experience slowness or failure. So, it is imperative to trace the transactions in real time to identify how they are being executed and where slowness occurs.
The JVM, as we see in this article, is a core piece of the Java stack. How CPU and memory are allocated, utilized and managed determine how efficient the application processing will be.
Lastly, the Java web container, where the business logic for the execution of the application code resides is an important component of the application middleware.
All these three components need to be monitored in the context of one another to get full stack visibility of the Java application.
Digital Experience Monitoring is a tool that should be integrated with an organization's change management strategy. A key benefit of SaaS/cloud is no longer being responsible for software and hardware upgrades, maintenance, and patch cycles. Migrating to Microsoft Office 365 means no longer spending precious time and resources on Windows, Exchange or SharePoint upgrades for example. But that doesn't mean that IT can ignore changes or doesn't need to monitor for their effects ...
As systems become more complex and IT loses direct control of infrastructure (hello cloud), it becomes both more difficult and more important to capture and observe, holistically, the user experience. SaaS or cloud apps like Salesforce, Microsoft Office 365, and Workday have become mission-critical to most businesses and therefore need to be examined when it comes to experience monitoring ...
Newly distributed operations teams are struggling to cope with the sudden change to the WFH (work from home) concept. IT operations teams were traditionally set up to work from centralized locations, unlike software and engineering teams. Some organizations have overcome that by implementing AIOps solutions; others are using a brute force method of employing more IT operations analysts to keep the distributed NOCs going ...
Enterprises that halted their cloud migration journey during the current global pandemic are two and a half times more likely than those that continued their move to the cloud to have experienced IT outages that negatively impacted their SLAs, according to Virtana's latest survey report The Current State of Hybrid Cloud and IT ...