CPU spikes are one of the crucial frequent efficiency challenges confronted by Java functions. Whereas conventional APM (Software Efficiency Administration) instruments present high-level insights into general CPU utilization, they typically fall wanting figuring out the basis reason for the spike. APM instruments often can’t pinpoint the precise code paths inflicting the difficulty. That is the place non-intrusive, thread-level evaluation proves to be way more efficient. On this publish, I’ll share a couple of sensible strategies that will help you diagnose and resolve CPU spikes with out making modifications in your manufacturing setting.
Intrusive vs Non-Intrusive Strategy: What Is the Distinction?
Intrusive Strategy
Intrusive approaches contain making modifications to the appliance’s code or configuration, corresponding to enabling detailed profiling, including additional logging, or attaching efficiency monitoring brokers. These strategies can present in-depth information, however they arrive with the danger of affecting the appliance’s efficiency and might not be appropriate for manufacturing environments as a result of added overhead.
Non-Intrusive Strategy
Non-intrusive approaches, alternatively, require no modifications to the working software. They depend on gathering exterior information corresponding to thread dumps, CPU utilization, and logs with out interfering with the appliance’s regular operation. These strategies are safer for manufacturing environments as a result of they keep away from any potential efficiency degradation and can help you troubleshoot dwell functions with out disruption.
1. high -H + Thread Dump
Excessive CPU consumption is all the time brought on by the threads which can be constantly making use of code. Our software tends to have lots of (generally hundreds) of threads. Step one in analysis is to determine CPU-consuming threads from these lots of of threads.
A easy and efficient method to do that is through the use of the high
command. The high
command is a utility obtainable on all flavors of Unix methods that gives a real-time view of system useful resource utilization, together with CPU consumption by every thread in a selected course of. You’ll be able to concern the next high
command to determine which threads are consuming probably the most CPU:
high -H -p <PROCESS_ID>
This command lists particular person threads inside a Java course of and their respective CPU consumption, as proven in Determine 1 beneath:
When you’ve recognized the CPU-consuming threads, the following step is to determine what strains of code these threads are executing. To do that, you could seize a thread dump from the appliance, which can present the code execution path of these threads. Nevertheless, there are a few issues to bear in mind:
- You want to concern the
high -H -p <PROCESS_ID>
command and seize the thread dump concurrently to know the precise strains of code inflicting the CPU spike. CPU spikes are transient, so capturing each on the similar time ensures you’ll be able to correlate the excessive CPU utilization with the precise code being executed. Any delay between the 2 can lead to lacking the basis trigger. - The
high -H -p <PROCESS_ID>
command prints thread IDs in decimal format, however within the thread dump, thread IDs are in hexadecimal format. You’ll have to convert the decimal Thread IDs to hexadecimal to look them up within the dump.
That is the best and correct technique to troubleshoot CPU spikes. Nevertheless, in sure environments, particularly containerized environments, the highest command might not be put in. In such circumstances, you would possibly wish to discover the choice strategies talked about beneath.
2. RUNNABLE State Threads Throughout A number of Dumps
Java threads could be in a number of states: NEW
, RUNNABLE
, BLOCKED
, WAITING
, TIMED_WAITING
, or TERMINATED
. In case you are , it’s possible you’ll be taught extra about completely different Thread States. When a thread is actively executing code, it is going to be within the RUNNABLE
state. CPU spikes are all the time brought on by threads within the RUNNABLE
state. To successfully diagnose these spikes:
- Seize 3-5 thread dumps at intervals of 10 seconds.
- Establish threads that stay constantly within the
RUNNABLE
state throughout all dumps. - Analyze the stack traces of those threads to find out what a part of the code is consuming the CPU.
Whereas this evaluation could be carried out manually, thread dump evaluation instruments like fastThread automate the method. fastThread generates a “CPU Spike” part that highlights threads that have been persistently within the RUNNABLE
state throughout a number of dumps. Nevertheless, this technique gained’t point out the precise share of CPU every thread is consuming.
Disadvantages
This technique will present all threads within the RUNNABLE
state, no matter their precise CPU consumption. For instance, threads consuming 80% of CPU and threads consuming solely 5% will each seem. It wouldn’t present the precise CPU consumption of particular person threads, so you will have to deduce the severity of the spike, based mostly on thread conduct and execution patterns.
3. Analyzing RUNNABLE State Threads From a Single Dump
Generally, it’s possible you’ll solely have a single snapshot of a thread dump. In such circumstances, the method of evaluating a number of dumps can’t be utilized. Nevertheless, you’ll be able to nonetheless try to diagnose CPU spikes by specializing in the threads within the RUNNABLE
state. One factor to notice is that the JVM classifies all threads working native strategies as RUNNABLE
, however many native strategies (like java.internet.SocketInputStream.socketRead0()
) don’t execute code and as a substitute simply anticipate I/O operations.
To keep away from being misled by such threads, you’ll have to filter out these false positives and concentrate on the precise RUNNABLE
state threads. This course of could be tedious, however fastThread automates it by filtering out these deceptive threads in its “CPU Consuming Threads” part, permitting you to concentrate on the actual culprits behind the CPU spike.
Disadvantages
This technique has a few disadvantages:
- A thread may be briefly within the
RUNNABLE
state however might rapidly transfer toWAITING
orTIMED_WAITING
(i.e., non-CPU-consuming states). In such circumstances, counting on a single snapshot might result in deceptive conclusions in regards to the thread’s affect on CPU consumption. - Much like technique #2, it can present all threads within the
RUNNABLE
state, no matter their precise CPU consumption. For instance, threads consuming 80% of CPU and threads consuming solely 5% will each seem. It wouldn’t present the precise CPU consumption of particular person threads, so you will have to deduce the severity of the spike, based mostly on thread conduct and execution patterns.
Case Research: Diagnosing CPU Spikes in a Main Buying and selling Software
In a single case, a significant buying and selling software skilled extreme CPU spikes, considerably affecting its efficiency throughout crucial buying and selling hours. By capturing thread dumps and making use of the tactic #1 mentioned above, we recognized that the basis trigger was the usage of a non-thread-safe information construction. A number of threads have been concurrently accessing and modifying this information construction, resulting in extreme CPU consumption. As soon as the difficulty was recognized, the event crew changed the non-thread-safe information construction with a thread-safe various, which eradicated the competition and drastically decreased CPU utilization. For extra particulars on this case examine, learn extra here.
Conclusion
Diagnosing CPU spikes in Java functions could be difficult, particularly when conventional APM instruments fall brief. By utilizing non-intrusive strategies like analyzing thread dumps and specializing in RUNNABLE
state threads, you’ll be able to pinpoint the precise reason for the CPU spike.