Out-of-memory errors in containerized Java applications may be very irritating, particularly when taking place in a manufacturing surroundings. These errors can occur for numerous causes. Understanding the Java Reminiscence Pool mannequin and various kinds of OOM errors can considerably assist us in figuring out and resolving them.
1. Java Reminiscence Pool Mannequin
Java Heap
Objective
Java heap is the area the place reminiscence is allotted by JVM for storing Objects and dynamic information at runtime. It’s divided into particular areas for environment friendly reminiscence administration (Younger Gen, Previous Gen, and many others.). Reclamation of reminiscence is managed by the Java GC course of.
Tuning Parameters
Class Loading
Objective
This reminiscence house shops class-related metadata after parsing the lessons. The determine beneath exhibits the 2 sections within the Class Associated Reminiscence pool.
These 2 instructions can provide the class-related stats from JVM:
jcmd <PID> VM.classloader_stats
jcmd <PID> VM.class_stats
Tuning Parameters
-XX:MaxMetaspaceSize
-XX:CompressedClassSpaceSize
-XX:MetaSpaceSize
Code Cache
Objective
That is the reminiscence area that shops compiled native code generated by the JIT compiler. This serves as a cache for steadily executed byte code that’s compiled into native machine code. Ceaselessly executed byte code is known as hotspot
. It’s for enhancing the efficiency of the Java utility.
This space consists of JIT-Compiled Code, Runtime Stubs, Interpreter Code, and Profiling Info.
Tuning Parameters
-XX:InitialCodeCacheSize
-XX:ReservedCodeCacheSize
Threads
Objective
Every thread has its personal reminiscence. The aim of this reminiscence space is to retailer method-specific information for every thread.
- Examples: Technique Name Frames, Native Variables, Operand Stack, Return Handle, and many others.
Tuning Parameters
Symbols
Objective
Symbols are represented as proven within the determine beneath:
Tuning Parameters
-XX:StringTableSize
- Additionally, the next command will give Symbols-related statistics:
jcmd <PID> VM.stringtable | VM.symboltable
Different Part
Objective
That is to bypass the heap to allocate quicker off-heap memory. They’re primarily used for environment friendly low-level I/O operations; largely functions with frequent information transfers.
There are 2 methods you may entry Off-Heap reminiscence:
- Direct Byte Buffers
- Unsafe.allocateMemory
Direct ByteBuffers
ByteBuffers may be allotted by ByteBuffer.allocateDirect
. Reclamation of direct ByteBuffers is thru GC.
- Tuning parameter
-XX:MaxDirectMemorySize
FileChannel.map
That is used to create a memory-map file that enables direct reminiscence entry to file contents by mapping a area of a file into the reminiscence of the Java course of.
- Tuning parameter:
- Reminiscence will not be restricted and never counted in NMT (Native Reminiscence Monitoring device).
NMT is a device out there for monitoring the Reminiscence swimming pools allotted by JVM. Under is a pattern output.
- Java Heap (reserved=2458MB, dedicated=2458MB)
(mmap: reserved=2458MB, dedicated=2458MB)
- Class (reserved=175MB, dedicated=65MB)
(lessons #11401)
( occasion lessons #10564, array lessons #837)
(malloc=1MB #27975)
(mmap: reserved=174MB, dedicated=63MB)
( Metadata: )
( reserved=56MB, dedicated=56MB)
( used=54MB)
( free=1MB)
( waste=0MB =0.00%)
( Class house:)
( reserved=118MB, dedicated=8MB)
( used=7MB)
( free=0MB)
( waste=0MB =0.00%)
- Thread (reserved=80MB, dedicated=7MB)
(thread #79)
(stack: reserved=79MB, dedicated=7MB)
- Code (reserved=244MB, dedicated=27MB)
(malloc=2MB #8014)
(mmap: reserved=242MB, dedicated=25MB)
- GC (reserved=142MB, dedicated=142MB)
(malloc=19MB #38030)
(mmap: reserved=124MB, dedicated=124MB)
- Inside (reserved=1MB, dedicated=1MB)
(malloc=1MB #4004)
- Different (reserved=32MB, dedicated=32MB)
(malloc=32MB #37)
- Image (reserved=14MB, dedicated=14MB)
(malloc=11MB #140169)
(area=3MB #1)
2. Varieties of OOM Errors and Root Trigger
In Java, OOM happens when JVM runs out of reminiscence for Object/Information construction allocation. Under are the various kinds of OOM errors generally seen in Java functions.
Java Heap Area OOM
This occurs when the heap reminiscence is exhausted.
Signs
java.lang.OutOfMemoryError: Java heap house
Attainable Causes
- Software has real reminiscence wants.
- Reminiscence leaks as a result of the applying will not be releasing the objects.
- GC tuning points
Instruments
- Jmap to gather the heap dump
- YourKit/VisualVM/Jprofiler/JFR to profile the heap dump for big Objects and non-GCed Objects
Metaspace OOM
This occurs when allotted MetaSpace will not be ample to retailer class-related metadata. For extra data on MetaSpace, confer with the Java Reminiscence Pool mannequin above. From Java 8 onwards, MetaSpace is allotted on the native memory and never on the heap.
Signs
java.lang.OutOfMemoryError: Metaspace
Attainable Causes
- There are a lot of dynamically loaded lessons.
- Class loaders are usually not correctly rubbish collected, resulting in reminiscence leaks.
Instruments
- Use profiling instruments akin to VisualVM/Jprofiler/JFR to test for extreme class loading or unloading.
- Allow GC logs.
GC Overhead Restrict Exceeded OOM
This error occurs when JVM spends an excessive amount of time on GC however reclaims too little house. This happens when the heap is nearly full and the rubbish collector cannot free a lot house.
Signs
java.lang.OutOfMemoryError: GC overhead restrict exceeded.
Attainable Causes
- GC tuning problem or flawed GC algorithm is chosen
- Real utility requirement for extra heap house
- Reminiscence leaks: Objects within the heap are retained unnecessarily.
- Extreme logging or buffering
Instruments
Native Reminiscence OOM /Container Reminiscence Restrict Breach
This largely occurs when utility/JNI/JVM/third-party libraries attempt to use native reminiscence. This error includes native reminiscence, which is managed by the working system, and is utilized by the JVM for aside from heap allocation.
Signs
java.lang.OutOfMemoryError: Direct buffer reminiscence
java.lang.OutOfMemoryError: Unable to allocate native reminiscence
Crashes with no obvious Java OOM error
java.lang.OutOfMemoryError: Map failed
java.lang.OutOfMemoryError: Requested array dimension exceeds VM restrict
Attainable Causes
- Extreme utilization of native reminiscence: Java functions can instantly allocate native reminiscence utilizing
ByteBuffer.allocateDirect()
. Native reminiscence is proscribed by both the container restrict or the working system restrict. If the allotted reminiscence will not be launched, you’ll get OOM errors. - Primarily based on the working system configuration, every thread consumes a certain quantity of reminiscence. An extreme variety of threads within the utility can lead to an
'unable to allocate native reminiscence'
error. There are 2 causes the applying can get into this state. Both there’s not sufficient native reminiscence out there or there’s a restrict on the variety of threads per course of on the working system stage and the applying reaches that stage. - There’s a restrict on the scale of an array. That is platform-dependent. If the applying request exceeds this requirement, JVM will increase the
'Requested array dimension exceeds'
error.
Instruments
- pmap: It is a default device out there in Linux-based OS. This can be utilized to investigate reminiscence. It is a utility to record the reminiscence map of a course of. It gives a snapshot of the reminiscence segments allotted to a selected course of.
- NMT (Native Reminiscence Monitoring): It is a device that can provide the reminiscence allocation finished from JVM.
- jemalloc: This device can be utilized to observe the reminiscence allotted from exterior the JVM.
3. Case Research
These are a few of the OOM points I confronted at work, and I additionally clarify how I recognized their root causes.
Container OOM Killed: Situation A
Downside
We had a streaming information processing utility in Apache Kafka/Apache Flink. The streaming utility was deployed on containers managed by Kubernetes. The Java containers had been periodically experiencing OOM-killed errors.
Evaluation
We began analyzing the heap dump, however the heap dump did not reveal any clues as heap house was not rising. Subsequent, we began the containers by enabling the NMT (Native Reminiscence Monitoring device). NMT has a function to see the distinction between 2 snapshots. It clearly reported {that a} sudden spike within the “Different” part (please test the pattern output given within the Java Reminiscence Pool mannequin part) is leading to OOM killed. Additional to this, we enabled 'Async-profiler'
on this Java utility. This helped us to root out the problematic space.
Root Trigger
This was a streaming utility and we had enabled the 'checkpointing'
function of Flink. This function periodically saves information to a distributed storage. Information switch is an I/O operation and this requires byte buffers from native house. On this case, the native reminiscence utilization was legit. We reconfigured the applying with the proper mixture of heap and native reminiscence. Issues began operating high quality thereafter.
Container OOM Killed: Situation B
Downside
That is one other streaming utility and the container was getting killed with an OOM error. For the reason that similar service was operating high quality on one other deployment, it was just a little arduous to determine the foundation trigger. One essential function of this service is to put in writing the info to underlying storage.
Evaluation
We enabled jemalloc
and began the Java utility in each environments. Each environments had the identical ByteBuffer necessities. Nonetheless, we seen that within the surroundings the place it was operating high quality, the ByteBuffer was getting cleaned up after the GC. Within the surroundings the place it was throwing OOM, the info move was much less, and the GC depend was manner lower than the opposite.
Root Trigger
There’s a YouTube video explaining the identical precise downside. We had two decisions right here: both allow specific GC or cut back the heap dimension to pressure earlier GC. For this particular downside, we selected the second strategy, and that resolved it.
Container OOM Killed: Situation C
Downside
As soon as once more, a streaming utility with checkpoint enabled: at any time when the “checkpointing” was taking place the applying crashed with java.lang.OutOfMemoryError: Unable to allocate native reminiscence
.
Evaluation
The problem was comparatively easy to root trigger. We took a thread dump and that exposed that there have been near 1000 threads within the utility.
Root Trigger
There’s a restrict on the utmost variety of threads per course of within the working system. This restrict may be checked through the use of the beneath command.
We determined to rewrite the applying to scale back the full variety of threads.
Heap OOM: Situation D
Downside
We had an information processing service with a really excessive enter price. Sometimes, the applying would run out of heap reminiscence.
Evaluation
To determine the difficulty, we determined to periodically acquire the heap dumps.
Root Trigger
The heap dump revealed that the applying logic to clear the Window (streaming pipeline Window) that collects the info was not getting triggered due to a thread competition problem.
The repair was to right the thread competition problem. After that, the applying began operating easily.
4. Abstract
Out-of-memory errors in Java functions are very arduous to debug, particularly whether it is taking place within the native reminiscence house of a container. Understanding the Java reminiscence mannequin will assist to root the reason for the issue to a sure extent.
- Monitor the utilization of processes utilizing instruments akin to
pmap
,ps
, andhigh
in a Linux surroundings. - Determine whether or not the reminiscence points are within the heap, JVM non-heap reminiscence areas, or within the areas exterior the JVM.
- Forcing GC will assist you to determine whether or not it’s a heap problem or a local reminiscence leak.
- Use profiling instruments akin to JConsole or JVisualVM.
- Use instruments like Prometheus, Grafana, and JVM Metrics Exporter to observe reminiscence utilization.
- To determine the native reminiscence leak from inside JVM, use instruments akin to AsyncProfiler and NMT.
- To determine the native reminiscence leak exterior of JVM, use
jemalloc
. NMT additionally will assist right here.