Remember an experience of using jprofilor to locate the OOM of the production environment

Remember an experience of using jprofilor to locate the OOM of the production environment

This article first appeared in: Walker AI

With the continuous development of platform business and the continuous increase of platform music library data, the system occasionally has memory overflow. Compared with other exceptions, memory overflow is usually more obscure, and usually occurs with the accumulation of time. Therefore, you cannot simply locate the problem based on where the exception occurs, but find the source of the problem. Therefore, it is necessary to know how to troubleshoot system memory overflow. This article takes a memory overflow in a production environment as an example to briefly explain how to use jprofiler to locate the problem.

1. Problem

Suddenly I received a message that the platform was malfunctioning and opened the homepage of the website, and found that all interfaces could not be accessed normally and the page failed to load, as follows:

Check the exception log for the first time, and found that the log is almost all of the following:

There is no doubt that a memory overflow occurred in the system. But the specific reason cannot be located from the log alone. Generally, for a stable system, the memory size occupied by it will generally remain within a certain range, so memory overflow generally has the following reasons:

  • When the programmer wrote the code, it caused a memory leak, the unreleased memory gradually accumulated, and eventually the memory overflowed;
  • Try to apply for a larger memory in the code, resulting in most of the memory being occupied, or directly causing memory overflow;
  • Sudden increase in system traffic, the original available memory is not enough to support, etc.

Fortunately, the online environment of the platform is configured with jvm parameters, and the heap dump file will be automatically dumped when OOM occurs. Next, we must start with the dump file to locate the problem.

2. Preparation

2.1 Tools

jprofilor is a powerful JVM monitoring tool that provides various accurate monitoring of JVM, including monitoring and analysis functions of memory, GC, CPU usage, thread conditions and other aspects. For memory overflow, the heap snapshot analysis function of jprofiler is usually used, and other functions are beyond the scope of this article.

The installation of jprofiler is very simple, just follow the installation wizard all the way to the next step, the installation and operation interface is as follows:

Click "Start Center" in the upper left corner, you can select different functional modules in the pop-up window, among which "Quick Attach" can quickly connect to the running java process on this machine or other hosts for real-time monitoring.

Take the IDEA process running on this machine as an example, select "Start Center" to see the IDEA Java process running:

Select "Start" to enter the real-time monitoring mode, and you can view various information about the process:

For the elimination of memory overflow problems, the heap memory snapshot analysis function of jprofiler is usually used, which will be discussed later.

2.2 jvm parameters

For systems running online, it is obviously inconvenient to use the "Quick Attach" method to monitor and analyze the java process. Fortunately, jvm has added special parameters for us to generate "Out of Memory" exceptions in the java process, namely Automatically dump heap memory snapshot files when memory leaks. Therefore, we only need to analyze the heap memory snapshot. To this end, it is necessary to add the startup parameters to enable this function when the java process is started, as follows:

 	java -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/app.hprof -jar app.jar

among them,


Indicates that when the jvm occurs OOM, the dump file is automatically generated;


Used to specify the path of the generated dump file.

3. Analyze the dump file of the jvm heap

With the above configuration, the problem can be eliminated when the system generates OOM. Here, we take the dump file generated by an OOM of the platform as an example to briefly explain the troubleshooting process.

Select "Open a Single Snapshot" in the "Open Snapshots" function in the "Start Center" of jprofiler:

After the import is successful, you can see the following interface:

Among them, under the "Classes" column are various instance sizes distinguished by type, and you can choose to sort them by number or total size. The "char[]" type as the built-in implementation of "String" usually takes up a lot of memory, so it is not easy to locate the problem only through this list. At this time, you can select the "Biggest Objects" column. As the name suggests, this column is mainly for large objects. As shown below:

Using the "Biggest Objects" function, we can quickly find that one of the Stringtypes of objects is 110MB, which is obviously unreasonable. Next, you can try to Stringpreliminarily determine the location where the object is generated based on the content of the object; if it still cannot be accurately determined as a problem, you can right-click the object and select "Use Selected Objects":

In the pop-up window, there is the "Reference" option for viewing object references, among which the "Outgoing references" and "Incoming references" options refer to viewing the references held by the currently selected object and the references holding the current object respectively.

Since the Stringtype is an immutable type, its internal implementation is char[]completed, so choose "Incoming references" here:

In this window, not only the content and size of the object can be seen, but also the thread where the object is located. It feels that we are a step closer to the source of the problem. Then select "show more":

At this point, the entire call stack that caused the object to be generated is at a glance. In this OOM, the final positioning reason is that a huge SQL statement is generated in the business code. This part of the code can be executed normally when the amount of data is small. Because a large amount of data was newly added not long ago, the code level did not consider the corresponding processing. So there is OOM. With this lesson, you should also consider avoiding such situations when you write code in the future!

4 Conclusion

In general, memory overflow is a more difficult problem to deal with. The examples in this article only involve the simplest analysis scenarios, but most of the reasons can usually be found; for more complex OOM scenarios, it is necessary to combine specific actual conditions. As for how to avoid such problems, we still need to start with code writing. After all, such problems in large divisions are caused by programmers improperly writing codes.

PS: For more technical dry goods, please pay attention to [public account| xingzhe_ai], and discuss with Xingzhe!