如何分析java程序cpu使用高

此文章简要介绍一下如何分析java程序使用cpu较高的问题。

当发现服务器负载比较高,这时我们一般会通过top来观察大概是什么程序什么原因导致的:

1
top

假定使用cpu较高的是java程序,这时我们可以先做两件事。下面通过例子来说明分析过程(例子中程序cpu使用并不高)。

通过top命令获取cpu使用高的线程id

使用top命令,按’P’对cpu使用率进行排序,找到cpu使用高的java程序pid:

1
2
# 按P对cpu使用进行排序
top

比如下面的输出,java的pid为30503:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
top - 16:50:47 up 1058 days, 21:37,  2 users,  load average: 1.67, 1.46, 1.46
Tasks: 1540 total, 1 running, 1535 sleeping, 3 stopped, 1 zombie
Cpu(s): 0.5%us, 0.3%sy, 0.0%ni, 98.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 164839304k total, 162163208k used, 2676096k free, 3797816k buffers
Swap: 8388604k total, 744k used, 8387860k free, 118834616k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30503 root 20 0 61.7g 1.7g 17m S 20.7 1.1 20481:50 java
28109 mysql 20 0 28.8g 879m 7208 S 3.6 0.5 78475:47 xxxx1
6282 apache 20 0 403m 18m 5048 S 2.6 0.0 2:47.34 xxxx2
22554 root 20 0 1461m 9300 3632 S 2.0 0.0 102:04.07 xxxx3
22262 root 20 0 16104 2436 960 R 1.6 0.0 0:00.15 xxxx4
17130 root 20 0 5220m 2.0g 6504 S 1.3 1.3 1906:33 xxxx5
12754 root 20 0 0 0 0 D 1.0 0.0 14285:07 xxxx6

找出使用cpu较高的线程,最后一列即为线程id:

1
ps -p 30503 -L -o pcpu,cpu,nice,state,cputime,pid,tid | sort

输出如下,这里线程23491 cpu使用比较高:

1
2
3
4
5
6
7
 0.5   -   0 S 06:42:19 30503 31086
0.8 - 0 S 05:35:25 30503 23492
1.1 - 0 S 15:16:35 30503 31258
1.6 - 0 S 20:47:36 30503 31495
1.7 - 0 S 21:53:08 30503 31087
5.9 - 0 S 1-00:36:33 30503 23491
%CPU CPU NI S TIME PID TID

jstack打印jvm线程堆栈

jstack可以详细的把jvm的线程堆栈信息打印出来,这些线程堆栈信息可供后面分析:

1
jstack -l <process id of java process>

前文已经查出进程30503 cpu使用比较高,接下来使用jstack把线程堆栈打印出来:

1
jstack -l 30503

输出内容类似如下(只截取部分内容):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
"Curator-TreeCache-14" #7685780 daemon prio=5 os_prio=0 tid=0x00007f766a784800 nid=0x5bc6 waiting on condition [0x00007f7bcdee1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000082c02180> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- None

"KafkaTopicOffsetGetter" #7685779 prio=5 os_prio=0 tid=0x00007f766a526800 nid=0x5bc4 waiting on condition [0x00007f7bce5e8000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at kafka.manager.actor.cluster.KafkaStateActor$KafkaTopicOffsetGetter.run(KafkaStateActor.scala:1524)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- None

"KafkaManagedOffsetCache" #7685778 prio=5 os_prio=0 tid=0x00007f766942f000 nid=0x5bc3 runnable [0x00007f7bce1e4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0000000082bf0770> (a sun.nio.ch.Util$2)
- locked <0x0000000082bf0760> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000082bf0780> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:684)
at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1171)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at kafka.manager.actor.cluster.KafkaManagedOffsetCache$$anonfun$run$4.apply(KafkaStateActor.scala:344)
at kafka.manager.actor.cluster.KafkaManagedOffsetCache$$anonfun$run$4.apply(KafkaStateActor.scala:326)
at scala.util.Success.foreach(Try.scala:236)
at kafka.manager.actor.cluster.KafkaManagedOffsetCache.run(KafkaStateActor.scala:326)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x0000000082be7a08> (a java.util.concurrent.locks.ReentrantLock$FairSync)

"Curator-PathChildrenCache-50" #7685777 daemon prio=5 os_prio=0 tid=0x00007f766a456800 nid=0x5bc2 waiting on condition [0x00007f779b8f7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000082b918e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)

线程堆栈信息里有一个nid(native id),这个id跟前文ps打印出来的线程id是对应的,不同在于 jstack 打印出来的是十六进制的,ps打印出来的是十进制的。可以通过printf命令把十进制转换成十六进制:

1
printf "0x%x\n" tid

我们把前文找出来的线程id转换成十六进制:

1
printf "0x%x\n" 23491

输出如下:

1
0x5bc3

再通过jstack的输出,查找nid为0x5bc3的线程,就可以找出cpu使用高的线程了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
"KafkaManagedOffsetCache" #7685778 prio=5 os_prio=0 tid=0x00007f766942f000 nid=0x5bc3 runnable [0x00007f7bce1e4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0000000082bf0770> (a sun.nio.ch.Util$2)
- locked <0x0000000082bf0760> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000082bf0780> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.kafka.common.network.Selector.select(Selector.java:684)
at org.apache.kafka.common.network.Selector.poll(Selector.java:408)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1171)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115)
at kafka.manager.actor.cluster.KafkaManagedOffsetCache$$anonfun$run$4.apply(KafkaStateActor.scala:344)
at kafka.manager.actor.cluster.KafkaManagedOffsetCache$$anonfun$run$4.apply(KafkaStateActor.scala:326)
at scala.util.Success.foreach(Try.scala:236)
at kafka.manager.actor.cluster.KafkaManagedOffsetCache.run(KafkaStateActor.scala:326)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x0000000082be7a08> (a java.util.concurrent.locks.ReentrantLock$FairSync)

这个线程为什么cpu使用高,还要根据具体的业务再去分析。

线程中的几种状态可以参考:BLOCKED,WAITING和TIMED_WAITING的区别