查看 I/O
如果CPU利用率不高,但是系统的Throughput和Latency上不去了,这说明我们的程序并没有忙于计算,而是忙于别的一些事,比如IO。(另外,CPU的利用率还要看内核态的和用户态的,内核态的一上去了,整个系统的性能就下来了。而对于多核CPU来说,CPU 0 是相当关键的,如果CPU 0的负载高,那么会影响其它核的性能,因为CPU各核间是需要有调度的,这靠CPU0完成)
网络
Refer to https://swsmile.info/post/performance-network-diagnose/.
iostat
iostat是I/O statistics(输入/输出统计)的缩写,iostat主要用于监控系统设备的IO负载情况,iostat首次运行时显示自系统启动开始的各项统计信息,之后运行iostat将显示自上次运行该命令以后的统计信息。用户可以通过指定统计的次数和时间来获得所需的统计信息。
install
$ sudo apt-get install sysstat -y
命令参数
-
-C
:显示CPU使用情况 -
-d
:只显示磁盘使用情况 -
-k
:以 KB 为单位显示 -
-m
:以 MB 为单位显示 -
-N
:显示磁盘阵列(LVM) 信息 -
-n
:显示NFS 使用情况 -
-p [磁盘]
:显示磁盘和分区的情况 -
-t
:显示终端和CPU的信息 -
-x
:显示详细信息 -
-V
:显示版本信息
Usage
$ iostat -d -m 1 100
Linux 5.4.0-89-generic (swubuntu2) 11/07/2021 _x86_64_ (8 CPU)
Device tps MB_read/s MB_wrtn/s MB_dscd/s MB_read MB_wrtn MB_dscd
dm-0 17.92 0.02 0.31 0.08 20962 319830 87635
loop0 0.00 0.00 0.00 0.00 3 0 0
loop1 0.00 0.00 0.00 0.00 1 0 0
loop2 0.00 0.00 0.00 0.00 5 0 0
loop3 0.00 0.00 0.00 0.00 5 0 0
loop4 0.00 0.00 0.00 0.00 3 0 0
loop5 0.00 0.00 0.00 0.00 1 0 0
loop6 0.00 0.00 0.00 0.00 1 0 0
loop7 0.06 0.00 0.00 0.00 64 0 0
loop8 0.00 0.00 0.00 0.00 1 0 0
sda 8.46 0.02 0.31 0.09 20987 318412 88907
含义:
- %user:Show the percentage of CPU utilization that occurred while executing at the user level (application).
- %nice:Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
- %system:Show the percentage of CPU utilization that occurred while executing at the system level (kernel).
- %iowait:CPU等待输入输出完成时间的百分比 - Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request
- %steal:管理程序维护另一个虚拟处理器时,虚拟CPU的无意识等待时间百分比 - Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor
- %idle:CPU空闲时间百分比 - Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
注:
- 如果%iowait的值过高,表示硬盘存在I/O瓶颈,%idle值高,表示CPU较空闲;
- 如果%idle值高但系统响应慢时,有可能是CPU等待分配内存
- %idle值如果持续低于10,那么系统的CPU处理能力相对较低,表明系统中最需要解决的资源是CPU。
-d
- 显示磁盘使用情况
$ iostat -d -k 1 10
Linux xx-generic (xx.com) 11/22/20 _x86_64_ (48 CPU)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.01 0.01 3684 2576
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
-d
:显示设备(磁盘)使用状态-k
:以 KB 为单位显示(某些使用block为单位的列强制使用Kilobytes为单位)- 1表示,数据显示每隔1秒刷新一次。
- 10 表示输出10次
含义:
- tps:transfers per second,该设备每秒的传输次数。“一次传输"意思是"一次I/O请求”。多个逻辑请求可能会被合并为"一次I/O请求"。“一次传输"请求的大小是未知的 - Indicate the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.
- kB_read/s:每秒从设备(device expressed)读取的数据量;
- kB_wrtn/s:每秒向设备(device expressed)写入的数据量;
- kB_read:读取的总数据量;
- kB_wrtn:写入的总数量数据量;
-x
- 显示详细信息
$ iostat -d -x -k 1 10
Linux xx-generic (xx.com) 11/22/20 _x86_64_ (48 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.01 0.01 12.78 0.00 7.63 19.12 0.17 4.35 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rrqm/s
:每秒这个设备相关的读取请求有多少被Merge了(当系统调用需要读取数据的时候,VFS将请求发到各个FS,如果FS发现不同的读取请求读取的是相同Block的数据,FS会将这个请求合并Merge)- The number of read requests merged per second that were queued to the device.wrqm/s
:每秒这个设备相关的写入请求有多少被Merge了 - The number of write requests merged per second that were queued to the device.r/s
:每秒完成的读 I/O 设备次数。即 rio/s - The number of read requests that were issued to the device per second.w/s
:每秒完成的写 I/O 设备次数。即 wio/s - The number of write requests that were issued to the device per second.rKB/s
:每秒读KB数 - The number of kilobytes read from the device per second.wKB/s
:每秒写KB数 - The number of kilobytes written to the device per second.avgrq-sz
:平均每次设备I/O操作的数据大小(扇区)- The average size (in sectors) of the requests that were issued to the device.avgqu-sz
:平均I/O队列长度。毫无疑问,队列长度越短越好 - The average queue length of the requests that were issued to the device.await
:每一个I/O请求的处理平均时间(单位是微秒毫秒)。这里可以理解为I/O的响应时间,一般地系统IO响应时间应该低于5ms,如果大于10ms就比较大了。 - The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.- 这个时间包括了队列时间和服务时间,也就是说,一般情况下,await大于svctm,它们的差值越小,则说明队列时间越短,反之差值越大,队列时间越长,说明系统出了问题。
svctm
:表示平均每次设备I/O操作的服务时间(以毫秒为单位) - The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.- 如果svctm的值与await很接近,表示几乎没有I/O等待时间,磁盘性能很好;
- 如果await的值远高于svctm的值,则表示I/O队列等待太长,I/O响应非常慢
%util
:在统计时间内所有处理I/O时间,除以总共统计时间 - Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.- 例如,如果统计间隔1秒,该设备有0.8秒在处理IO,而0.2秒闲置,那么该设备的%util = 0.8/1 = 80%,所以该参数暗示了设备的繁忙程度
- 一般地,如果该参数是100%表示设备已经接近满负荷运行了(当然如果是多磁盘,即使%util是100%,因为磁盘的并发能力,所以磁盘使用未必就到了瓶颈)。
Reference
- man iostat
- https://linux.die.net/man/1/iostat
- https://www.cnblogs.com/peida/archive/2012/12/28/2837345.html
- https://linuxtools-rst.readthedocs.io/zh_CN/latest/tool/iostat.html
- https://www.cnblogs.com/ggjucheng/archive/2013/01/13/2858810.html