uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top
perf
Number of processes wanting to run. Includes processes blocked in uninterruptible IO.
Look for errors that can cause performance problems. "Out of memory", "TCP: ... dropping request", etc.
dmesg | grep oom-killer
- r: Number of processes running or waiting. Doesn't include IO.
IF r > num_cpu THEN saturation
. - b: Number of processes blocked by IO.
- free: Free memory in kb
- buffers: buffer cache, used for block device I/O.
- cached: page cache, used by file systems.
- si, so: Swap-ins and swap-outs.
IF si,so != 0 THEN out_of_memory
. - us, sy, id, wa, st: CPU time average
- us: user time.
- sy: system time.
IF sy > 20% THEN kernel processing IO inefficeitnly
. - id: idle.
- wa: IO wait.
- st: stolten time, time spent by hypervisor for other VMs.
Look out for signle hot CPU
Like top
but prints a rolling summary instead of clearing the screen.
- r/s: delivered reads per second
- rkB/s: kB read per second
- w/s: delivered writes per second
- wkB/s: kB write per second
- await: the average time for the IO in milliseconds. Includes both time queued and time being serviced.
IF high THEN device_saturation | device_problems
- avgqu-sz: the average number of requests issued to the device.
IF avgqu-sz > 1 THEN could_be saturation
. Still Multilple back-end disk devices can operate on requests in parallel. - %util: device utilization (busy %).
IF %util > 60% THEN poor_performance
(double check with await).IF %util ~= 100% THEN saturation
- buffers: buffer cache, used for block device I/O.
- cached: page cache, used by file systems.
- buff/cache: sum of buffers and cached.
- available: used for caches but could be quickly made available for the application.
IF buffers or cached ~= 0 THEN higher disk IO
- rxpck/s: number of packets received per second.
- txpck/s: number of packets transmitted per second.
- rxkB/s: number of kilobytes received per second.
- txkB/s: number of kilobytes transmitted per second.
- %ifutil: utilization percentage of the network interface. Could be unreliable.
- active/s: number of locally-initiated TCP connections per second (e.g., via connect()). (~ outbound)
- passive/s: number of remotely-initiated TCP connections per second (e.g., via accept()). (~ inbound)
- retrans/s: number of TCP retransmits per second. Sign of network or server issue.
You'll need a call graph:
--call-graph lbr
- aka Last Branch Record utilizes special hardware registers to store some limited call graph of last branching instruction (you can expect aroudn ~32 entries). Very fast, but requires modern hardware >Haswell >ARMv9.2-A.--call-graph fp
- use frame pointer to determine call graph, use if your binary is built with frame pointer (-fno-omit-frame-pointer
)--call-graph dwarf
- saves 8k of call stack to be analyzed later together with debug info. Produces largeperf.data
records, which are extremely slow toperf report
. Practically unuseful with high sampling rate, therefore limit sampling rate to 99 Hz with-F99
.
Example of comamnds:
Attach to running process to sample it for 10 seconds with 1000 Hz sample rate and LBR call-graph. Creates perf.data
record.
perf record -p <pid> --call-graph lbr -F1000 -- sleep 10
Sample all the system for 10 secods with dward
debug info, limiting samling rate to 99 Hz
perf record -a --call-graph dwarf -F99 -- sleep 10
If run on remote system, pack all necessary information to be analyzed later on a host system. This will create a .tar archive. Copy it together with perf.data
to the host system.
perf archive
On the host system unpack the .tar archive. This will extract .tar archive to ~/.debug
perf archive --unpack
You can later generate a report with perf.data
from remote system
perf report
perf-archive
is missing from all the Ubuntu perf packages. Get one from Linux source:
mkdir /usr/libexec/perf-core/
wget -O /usr/libexec/perf-core/perf-archive https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/tools/perf/perf-archive.sh
chmod +x /usr/libexec/perf-core/perf-archive