If you have deployed a Java application in production, you've probably encountered a situation where the application suddenly starts to take up a large amount of CPU. When this happens, application response becomes sluggish and users begin to complain about slow response. Often the solution to this problem is to restart the application and, lo and behold, the problem goes away — only to reappear a few days later. A key question then is: how to troubleshoot high CPU usage of a Java application?
Why Do Java Applications Take High CPU?
Java applications may take high CPU resources for many reasons:
■ Poorly designed application code with inefficient loops: Recursive method calls, inefficient usage of collections (e.g., excessively large ArrayLists instead of say using HashMaps) can also be reasons for this.
■ Shortage of memory in the Java Virtual Machine (JVM) can also reflect in high CPU usage. Instead of spending time in processing, the JVM spends more time in Garbage Collection, which in turn takes up CPU cycles.
■ A JVM may max out on CPU usage because of the incoming workload. The server capacity may not be sized sufficiently to handle the rate of requests coming in and in such a situation, the Java application may be doing work, trying to keep up with the workload.
Restarting an application will not solve a CPU usage problem — it only mitigates the problem for a short while, until the problem reappears. It is, therefore, essential to identify the cause of the CPU spike: is it due to poorly designed application code, insufficient memory allocation, or an unexpectedly high workload?
JVM Monitoring Can Assist with Diagnosis of CPU Issues
Modern JVMs (1.5 and higher) support Java Management Instrumentation (JMX) APIs. According to Wikipedia, Java Management Extensions is a Java technology that supplies tools for managing and monitoring applications, system objects, devices and service-oriented networks. Those resources are represented by objects called MBeans (for Managed Bean). Managing and monitoring applications can be designed and developed using the Java Dynamic Management Kit.
Using JMX, Java monitoring tools can explore what threads are running in the JVM, the state of each thread, the CPU usage of each thread etc. By periodically collecting these statistics, monitoring tools can correlate thread level performance information with the CPU usage of the Java application and answer the question "Why is the Java application taking high CPU?"
Figure 1 below depicts the monitoring of threads in a JVM. High and medium CPU threads are defined as threads that take up more than 50% CPU and 30-50% CPU respectively. The existence of any high or medium CPU thread is indicative of an application bottleneck, i.e., a piece of inefficient code that is executing frequently and taking up CPU. In this example, there is one high CPU thread.
Figure 1: Diagnosing high CPU threads in the JVM
Detailed diagnosis of this metric reveals the stack trace — i.e., which line of code is the thread that is taking CPU executing. If the thread is assigned a name in the application, the thread name is shown on the left-hand side of Figure 2 and the detailed stack trace is on the right-hand side. This information gives operations staff and developers exactly what they need to identify the cause of high CPU usage. The exact class, method and line of code can be determined. In this example, look in the com.zapstore.logic.LogicBuilder class, createLogic method and line number 223.
Figure 2: Identifying the cause of high JVM CPU usage
If the CPU usage is due to an unexpected workload increase, you should see the number of threads increase, and even if each thread consumes a small amount of CPU, the aggregate may be significant.
If none of the application threads is taking much CPU, the aggregate CPU usage of the application threads is low and yet the Java application is taking CPU, suspect garbage collection activity in the JVM. You may want to change the garbage collection algorithm or increase the heap and non-heap memory available to the JVM to alleviate the problem.
Historical information captured about the JVM's CPU usage and individual threads' CPU usage can be used to determine what is the real cause of the Java application's high CPU usage. You will no longer need to restart the application and hope that the problem goes away. The historical insights (like shown below in Figure 3) will help you accurately determine the cause of CPU spikes and fix them, so you do not have to deal with the same issues ever again.
Figure 3: Historical JVM performance analytics and trends
Enabling JMX for a JVM has minimal impact on its performance. Hence, this technique of monitoring Java applications is applicable even for production environments.
Get 360° Visibility and Insights into Java Application Performance
The performance of Java applications depends on three critical factors: the JVM, the Java web container (WebLogic, JBoss, Tomcat, etc.), and the application transactions performed on the front end by the business user. The transactions are where the end user will experience slowness or failure. So, it is imperative to trace the transactions in real time to identify how they are being executed and where slowness occurs.
The JVM, as we see in this article, is a core piece of the Java stack. How CPU and memory are allocated, utilized and managed determine how efficient the application processing will be.
Lastly, the Java web container, where the business logic for the execution of the application code resides is an important component of the application middleware.
All these three components need to be monitored in the context of one another to get full stack visibility of the Java application.
As the data generated by organizations grows, APM tools are now required to do a lot more than basic monitoring of metrics. Modern data is often raw and unstructured and requires more advanced methods of analysis. The tools must help dig deep into this data for both forensic analysis and predictive analysis. To extract more accurate and cheaper insights, modern APM tools use Big Data techniques to store, access, and analyze the multi-dimensional data ...
Modern enterprises are generating data at an unprecedented rate but aren't taking advantage of all the data available to them in order to drive real-time, actionable insights. According to a recent study commissioned by Actian, more than half of enterprises today are unable to efficiently manage nor effectively use data to drive decision-making ...
According to a study by Forrester Research, an enhanced UX design can increase the conversion rate by 400%. If UX has become the ultimate arbiter in determining the success or failure of a product or service, let us first understand what UX is all about ...
The requirements of an APM tool are now much more complex than they've ever been. Not only do they need to trace a user transaction across numerous microservices on the same system, but they also need to happen pretty fast ...
Performance monitoring is an old problem. As technology has advanced, we've had to evolve how we monitor applications. Initially, performance monitoring largely involved sending ICMP messages to start troubleshooting a down or slow application. Applications have gotten much more complex, so this is no longer enough. Now we need to know not just whether an application is broken, but why it broke. So APM has had to evolve over the years for us to get there. But how did this evolution take place, and what happens next? Let's find out ...