The Java diagnostic tool Arthas elegantly investigates the production environment

The Java diagnostic tool Arthas elegantly investigates the production environment


ArthasIt is Alibaba's open source Java diagnostic tool. Online troubleshooting without restarting; dynamic tracking of Java code; real-time monitoring of JVM status. For online anomalies where every second counts, it Arthascan help us quickly diagnose related problems.

Download and install

Download Arthasthearthas-boot.jar


arthasAfter downloading , first come to understand the help information, you can java -jar arthas-boot.jar -hview it through commands, here are some examples and parameter descriptions

[root@izwz94a0v1sz0gk4rezdcbz arthas]# java -jar arthas-boot.jar -h
[INFO] arthas-boot version: 3.1.4
Usage: arthas-boot [-h] [--target-ip <value>] [--telnet-port <value>]
       [--http-port <value>] [--session-timeout <value>] [--arthas-home <value>]
       [--use-version <value>] [--repo-mirror <value>] [--versions] [--use-http]
       [--attach-only] [-c <value>] [-f <value>] [--height <value>] [--width
       <value>] [-v] [--tunnel-server <value>] [--agent-id <value>] [--stat-url
       <value>] [pid]

Bootstrap Arthas

  java -jar arthas-boot.jar <pid>
  java -jar arthas-boot.jar --target-ip
  java -jar arthas-boot.jar --telnet-port 9999 --http-port -1
  java -jar arthas-boot.jar --tunnel-server 'ws://'
  java -jar arthas-boot.jar --tunnel-server 'ws://'
--agent-id bvDOe8XbTM2pQWjF4cfw
  java -jar arthas-boot.jar --stat-url ''
  java -jar arthas-boot.jar -c 'sysprop; thread' <pid>
  java -jar arthas-boot.jar -f <pid>
  java -jar arthas-boot.jar --use-version 3.1.4
  java -jar arthas-boot.jar --versions
  java -jar arthas-boot.jar --session-timeout 3600
  java -jar arthas-boot.jar --attach-only
  java -jar arthas-boot.jar --repo-mirror aliyun --use-http

Options and Arguments:
 -h,--help                      Print usage
    --target-ip <value>         The target jvm listen ip, default
    --telnet-port <value>       The target jvm listen telnet port, default 3658
    --http-port <value>         The target jvm listen http port, default 8563
    --session-timeout <value>   The session timeout seconds, default 1800
    --arthas-home <value>       The arthas home
    --use-version <value>       Use special version arthas
    --repo-mirror <value>       Use special maven repository mirror, value is
                                center/aliyun or http repo url.
    --versions                  List local and remote arthas versions
    --use-http                  Enforce use http to download, default use https
    --attach-only               Attach target process only, do not connect
 -c,--command <value>           Command to execute, multiple commands separated
                                by ;
 -f,--batch-file <value>        The batch file to execute
    --height <value>            arthas-client terminal height
    --width <value>             arthas-client terminal width
 -v,--verbose                   Verbose, print debug info.
    --tunnel-server <value>     The tunnel server url
    --agent-id <value>          The agent id register to tunnel server
    --stat-url <value>          The report stat url
 <pid>                          Target pid

start up

Start arthasbefore a first start springbootof the application. The demoaddress is

java -jar ytao-springboot-demo.jar

Start arthas-boot.jarcommand

java -jar arthas-boot.jar

Note here that you need to start demoand arthasuse the same authority user, otherwise you can't get the process information using the attach mechanism (I didn't pay attention when I just used it here, I encountered this problem). Example: rootuser-initiated demo, u1the user starts arthaswhen the print informationCan not find java process. Try to pass <pid> in command line.

Check the source code and add log output after getting the process. If the result is empty, return -1, and if the judgment result is less than 0, exit directly.

Start class Bootstrap#maincode

Process tool ProcessUtils#selectcode

Through the above analysis, we arthasmust start our target process before we start , otherwise it arthasmay not start.

Use the rootuser to start the success interface

Select java process, here we ytao-springboot-demoare 1, there will be connection information after selection

[INFO] arthas home:/root/.arthas/lib/3.1.4/arthas
[INFO] Try to attach process 22005
[INFO] Attach process 22005 success.
[INFO] arthas-client connect 3658
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
/ O /|  .--. ''--.  .--'|  '--'  |/ O /'   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |//  |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          

version   3.1.4                                                                 
pid       17339  
time      2019-10-17 02:29:06

dashboard data panel

Use dashboardcommands to view thread, memory, GC, and Runtime information

jad decompilation

Sometimes we will encounter that the online code running result is not what we expect. There are cases where the online code is not the version we want, but if you want to view it, you need to download it and then decompile it. At this time arthas, it jadcan help us perform real-time decompilation online to confirm whether the code conforms to our version.

jad com.ytao.service.UserServiceImpl

watch function execution information

Use the watchcommand to view the execution information of the function. watchList of parameters (from the official website)

parameter Parameter Description
class-pattern Class name expression matching
method-pattern Method name expression matching
express Observe the expression
condition-express Conditional expression
[b] Observe before the method call
[e] Observe after the method is abnormal
[s] Observe after the method returns
[f] Observe after the method ends (normal return and abnormal return)
[E] Turn on regular expression matching, the default is wildcard matching
[x:] Specify the attribute traversal depth of the output result, the default is 1

When we encounter online data bug, our general processing method is to simulate the online data in the development environment, find clues from the production log, or remotely debug. Regardless of the above investigation methods, they are relatively troublesome. At this time, Arthas watchcan help us view real-time code execution. Expressions can be viewed using the observation function , , . Observation expressions are mainly composed of OGNLexpressions, so you can write OGNLexpressions to execute them.

Observe the variables of the expression

variable Variable description
params Input parameters of the function
returnObj The return value of the function
throwExp Exception information
target Current object

View the input parameters and return value of a function

watch com.ytao.service.UserServiceImpl getUser "{params,returnObj}"

In the printed information, isEmpty=false;size=1you can see that the parameter is not empty and the number of parameters is one. View specific entry information

watch com.ytao.service.UserServiceImpl getUser "{params[0],returnObj}"

View exception information

watch com.ytao.service.UserServiceImpl getUser "throwExp"

When we pass a parameter -1, the print out illegal parameters we define exceptions

watchIn addition to observing expressions, you can also use , as well . Note that when using the observation event point, some variables of the observation expression may not exist, for example -b, when using , the return value and exception information are both empty.

Sometimes when we troubleshoot a function, we can't get the information of the function right away, and the information arthasprovided can help us record the log. The usage is similar to that of Linux.

watch com.ytao.service.UserServiceImpl getUser "{params,returnObj}" >/log/w.log &

View asynchronously saved logs

tt locate abnormal call

The watchfunctions described above can be used to check the call situation, which is more suitable for checking the information after the possible situation of the current call is known. If a function is called n times, there are a few execution exceptions, we have to find out these abnormal calls, it watchis not very convenient to troubleshoot. Use ttcommands to view abnormal calls and information more easily. The right com.ytao.service.UserServiceImpl#getUserfunction view -tis recorded every time the function is called

tt -t com.ytao.service.UserServiceImpl getUser

record information

View all records

tt -l

View the specified function record

tt -s '"getUser"'

Output information description

Form field Field explanation
INDEX Time segment record number, each number represents a call, and many subsequent tt commands specify record operations based on this number, which is very important.
TIMESTAMP The local time when the method was executed, recording the local time that occurred in this time segment
COST(ms) Method execution time
IS-RET Whether the method ends in the form of normal return
IS-EXP Whether the method ends by throwing an exception
OBJECT Execute the hashCode() of the object. Note that someone once mistakenly thought that it was the memory address of the object in the JVM, but unfortunately he was not. But it can help you simply mark the class entity that currently executes the method
CLASS Class name to be executed
METHOD The name of the method to be executed

From the above parameters, we can see that the 1003call ends in the form of throwing an exception, because ttthe information of each call is recorded, so we can view 1003the detailed information

tt -i 1003

trace view call link

We often encounter that the rt is too long when calling an api. We have to find out one or several functions in the call chain to optimize. We usually locate several possible anchor points and print the rt between each anchor point. Or find out the log printing time point from the log and calculate the time difference, no matter which method is used, it is more cumbersome. When using arthasthe tracecommand, we can easily complete our needs. traceParameter Description

parameter Parameter Description
class-pattern Class name expression matching
method-pattern Method name expression matching
condition-express Conditional expression
[E] Turn on regular expression matching, the default is wildcard matching
[n:] Command execution times
#cost Method execution time-consuming

Use the traceoutput com.ytao.controller.UserController#getUserinformation

trace com.ytao.service.UserServiceImpl getUser

Output result

In the process of actual use and troubleshooting, in order to reduce the output of useless information, we generally use #costfiltering time-consuming and jdk's own functions, which can be ignored to reduce the output of information. For example: filter out 1mscalls less than

trace com.ytao.service.UserServiceImpl getUser  '#cost > 1'

redefine implements hot deployment

When we found bugs and wanted to go online quickly to save the common people, Arthaswe prepared redefinecommands for us to implement hot updates. Although now advocating jad/mc/redefineheat more one-stop, but a good line of code is recommended to replace locally compiled and then, to avoid misuse hands. First UserServiceImpladd a line of code in

Get classLoaderHash, scget class information through commands

sc -d *UserServiceImpl

redefineClass to perform the modification

redefine -c 1d56ce6a/usr/local/jar/UserServiceImpl.class

Verify whether the UserServiceImplclass is updated through the printed information

ArthasIn addition to the use of the above, there are some other diagnostic functions, which are only methods I personally use. However, you must have a combination of punches to use this type of tool, and there are corresponding troubleshooting methods for problems encountered in the process of troubleshooting, not blindly.

Personal blog:

My official account ytao