Unraveling performance bottlenecks in a Linux environment can feel like detective work, often starting with limited snapshots from tools like top. But for true mastery of your system’s health, you need more than a fleeting glance. Enter Sysstat – a powerful suite of Linux performance monitoring utilities, including mpstat, pidstat, and sar. This comprehensive guide will equip tech-savvy system administrators and developers with advanced system administration tools to diagnose and troubleshoot even the most elusive system issues, from CPU contention to memory pressure and I/O bottlenecks. Dive in to unlock real-time insights and historical perspectives that top simply can’t provide.
Beyond top: Understanding Core Linux Performance Bottlenecks
Most sysadmins instinctively turn to the top command when symptoms like high CPU usage, system lag, or load spikes appear. While top provides an immediate, high-level snapshot, its fundamental limitation is that it only shows the current state. It doesn’t offer the historical context or detailed breakdown necessary to understand what is causing the problem over time.
In the complex world of Linux server troubleshooting, CPU is rarely the sole factor. Performance bottlenecks typically arise from a confluence of issues across multiple system areas:
CPU Scheduling Pressure
This includes scenarios like run queue contention, where processes are ready to execute but waiting for an available CPU core, or uneven core usage, where load is pinned to a few cores while others remain idle.
I/O Wait
Processes become blocked, waiting for disk or network operations to complete. High I/O wait indicates your system is spending significant time idle, awaiting data.
Memory Pressure
This encompasses swapping (moving data between RAM and disk), reclaim activity (the kernel freeing memory for new allocations), or cache eviction (flushing cached data).
Interrupt Handling
Uneven distribution of hardware interrupts across CPUs can lead to specific cores becoming saturated, impacting overall system responsiveness.
This is precisely where the Sysstat suite shines. It provides structured, historical visibility into your system’s behavior, moving beyond a simplistic point-in-time snapshot to give you the data needed for deep analysis.
Introducing the Sysstat Suite: Your Linux Performance Toolkit
Sysstat is a collection of utilities that together offer a real-time and historical view of virtually every aspect of your system’s operations. Its key tools include:
mpstat
Reports CPU usage per core, invaluable for identifying load imbalance, high softirq loads (often network-related), or saturation on specific CPUs.
pidstat
Tracks per-process CPU, memory, and I/O usage over time. This is your go-to for pinpointing individual processes responsible for performance degradation.
sar
The System Activity Reporter collects system-wide historical metrics. sar allows you to correlate CPU, memory, load, and I/O behavior across time intervals, eliminating guesswork from incident analysis.
These powerful tools offer consistent functionality across modern Linux distributions, including Ubuntu and RHEL, provided Sysstat version 12.x or later is installed. For advanced disk-level analysis, such as throughput, latency, and queue depth, iostat is typically used alongside vmstat as part of a deeper performance troubleshooting workflow.
Getting Started with Sysstat: Installation & Verification
Sysstat doesn’t ship by default on most Linux distributions. Before leveraging its capabilities, you’ll need to install it and verify the version. Certain valuable flags, like %wait in pidstat and %ifutil in sar, were introduced in later releases and won’t be available on older packages.
To install Sysstat, use the appropriate command for your specific Linux distribution:
sudo apt install sysstat [On Debian, Ubuntu and Mint]
sudo dnf install sysstat [On RHEL/CentOS/Fedora and Rocky/AlmaLinux]
sudo apk add sysstat [On Alpine Linux]
sudo pacman -S sysstat [On Arch Linux]
sudo zypper install sysstat [On OpenSUSE]
sudo pkg install sysstat [On FreeBSD]
Verify the installation and check the version with:
mpstat -V
A successful output will resemble:
sysstat version 12.6.1
(C) Sebastien Godard (sysstat orange.fr)
If you encounter "command not found," confirm the installation completed successfully and that /usr/bin is in your PATH. Use which mpstat to verify the binary’s existence. On Ubuntu, an additional step is required: enable data collection by setting ENABLED="true" in /etc/default/sysstat, as it’s off by default.
mpstat: Monitoring Per-CPU Usage in Linux
mpstat reports CPU usage statistics across all processors or for individual cores. On any multi-core system, it provides a far more granular view than the single summary line in top, immediately revealing whether load is evenly spread or concentrated on one CPU.
1. Display Global CPU Statistics
Running mpstat without options provides a single snapshot of average CPU usage across all processors since boot, establishing a baseline before delving into per-core or per-process numbers.
mpstat
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:22:10 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
14:22:10 all 18.43 0.02 3.11 1.24 0.00 0.09 0.00 0.00 0.00 77.11
The %iowait column is crucial here; consistently above 10-15% indicates the system is blocked waiting on disk reads or writes, a problem more CPU won’t solve. %soft tracks time spent in software interrupt handlers, a spike often signals a saturated network interface rather than a pure compute issue.
2. Show Statistics for Every Individual CPU
The -P ALL flag expands the "all" summary row into a separate line per CPU, immediately revealing if one core is shouldering the majority of the workload while others remain idle.
mpstat -P ALL
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:23:05 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
14:23:05 all 18.51 0.02 3.14 1.26 0.00 0.10 0.00 0.00 0.00 76.97
14:23:05 0 72.14 0.00 5.22 0.00 0.00 0.00 0.00 0.00 0.00 22.64
14:23:05 1 14.32 0.03 2.88 0.52 0.00 0.08 0.00 0.00 0.00 82.17
14:23:05 2 10.44 0.02 2.71 1.98 0.00 0.12 0.00 0.00 0.00 84.73
14:23:05 3 8.11 0.01 1.75 3.52 0.00 0.09 0.00 0.00 0.00 86.52
Here, CPU 0’s 72% user time while others are mostly idle is a classic sign of a single-threaded process pegging one core. pidstat (covered next) will then help you identify that exact process. You can also target a specific CPU (e.g., mpstat -P 0) on systems with many cores.
3. Monitoring Live CPU Activity Over Time
Adding an interval in seconds and a count provides continuous live samples, far more effective for spotting transient load spikes than a single averaged snapshot.
mpstat -P ALL 2 5
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:25:10 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
14:25:12 all 21.34 0.00 2.88 0.00 0.00 0.00 0.00 0.00 0.00 75.78
14:25:12 0 74.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 23.50
14:25:12 1 15.22 0.00 3.11 0.00 0.00 0.00 0.00 0.00 0.00 81.67
14:25:12 2 11.04 0.00 2.88 0.00 0.00 0.00 0.00 0.00 0.00 86.08
14:25:12 3 9.12 0.00 1.44 0.00 0.00 0.00 0.00 0.00 0.00 89.44
...
Average: all 19.87 0.01 3.02 0.88 0.00 0.08 0.00 0.00 0.00 76.14
The Average: row at the end summarizes all 5 samples, allowing you to quickly determine if CPU 0 remained hot throughout the full window or merely experienced a brief spike. The 2 5 syntax means "collect 5 readings, one every 2 seconds"; dropping the count will make mpstat run indefinitely until Ctrl+C.
4. Monitoring Interrupt Statistics Per Processor
The -I flag displays interrupt counts per second for each IRQ line across every CPU. This is the ideal tool when debugging a saturated network card or storage controller that’s disproportionately pinning interrupts to one core.
mpstat -I
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:27:19 CPU intr/s
14:27:19 all 742.18
14:27:19 CPU NET_TX/s NET_RX/s BLOCK/s SCHED/s RCU/s
14:27:19 0 0.04 1.44 9.22 12.44 41.22
14:27:19 1 0.04 2.01 8.88 11.97 44.11
14:27:19 2 0.05 1.88 9.04 12.11 42.88
14:27:19 3 0.03 1.92 8.76 11.88 43.04
High NET_RX/s concentrated on a single CPU, combined with high %soft (from example 1), suggests your NIC’s interrupts are affined to one core. Solutions involve running irqbalance or manually setting IRQ affinity via /proc/irq/<irq_number>/smp_affinity to distribute the interrupt load.
5. Display All CPU and Interrupt Statistics Together
The -A flag is a powerful shortcut, equivalent to running -u -I ALL -P ALL simultaneously. It provides a complete, one-shot dump of CPU utilization and interrupt counts for every processor in a single command.
mpstat -A
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:30:01 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
14:30:01 all 18.43 0.02 3.11 1.24 0.00 0.09 0.00 0.00 0.00 77.11
14:30:01 0 72.14 0.00 5.22 0.00 0.00 0.00 0.00 0.00 0.00 22.64
14:30:01 1 14.32 0.03 2.88 0.52 0.00 0.08 0.00 0.00 0.00 82.17
...
14:30:01 CPU intr/s
14:30:01 all 742.18
14:30:01 0 188.14
14:30:01 1 192.44
To redirect this comprehensive output to a timestamped log file for future reference, use:
mpstat -A >> /var/log/mpstat-$(date +%F).log
This creates a lightweight, manual performance record, useful on systems where you cannot install or configure sar‘s full cron setup but still need to capture a snapshot before a maintenance window.
pidstat: Monitor Per-Process Resource Usage in Linux
pidstat is the essential next step after mpstat identifies a hot CPU. It precisely reveals which process or thread is consuming resources. Unlike top, pidstat provides historical context, a detailed I/O breakdown, and thread-level visibility in a clean, parseable format.
6. Listing All Active Processes and CPU Usage
Running pidstat without arguments displays current CPU usage for every active process, averaged since boot. It’s the quickest way to confirm which processes are actively consuming CPU.
pidstat
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:32:44 UID PID %usr %system %guest %wait %CPU CPU Command
14:32:44 0 1 0.01 0.08 0.00 0.00 0.09 1 systemd
14:32:44 0 512 0.00 0.04 0.00 0.00 0.04 0 kworker/0:1H
14:32:44 1000 2841 18.44 0.88 0.00 0.00 19.32 0 python3
14:32:44 1000 3102 0.12 0.04 0.00 0.00 0.16 2 nginx
The %wait column (introduced in Sysstat 11.5) indicates the time a process spent waiting to run on a CPU despite being runnable. Consistent non-zero values here signify CPU saturation—you have more work queued than available cores. The CPU column tells you which core each process last ran on, directly correlating with mpstat‘s hot core identification.
7. Viewing All Processes Including Idle Ones
The default pidstat view omits processes with zero CPU activity since boot. Adding -p ALL includes every process, even sleeping ones, which is useful for obtaining a full PID list for a specific application or confirming a service is indeed running.
pidstat -p ALL
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:33:11 UID PID %usr %system %guest %wait %CPU CPU Command
14:33:11 0 1 0.01 0.08 0.00 0.00 0.09 1 systemd
14:33:11 0 2 0.00 0.00 0.00 0.00 0.00 0 kthreadd
14:33:11 0 3 0.00 0.00 0.00 0.00 0.00 0 rcu_gp
14:33:11 0 4 0.00 0.00 0.00 0.00 0.00 0 rcu_par_gp
14:33:11 1000 2841 18.44 0.88 0.00 0.00 19.32 0 python3
kworker and rcu_* entries are kernel worker threads, not user processes. Non-zero CPU usage by these can indicate underlying driver issues or filesystem problems warranting further investigation. If an expected service is missing from this output, it’s not running.
8. Monitor Per-Process Disk I/O
The -d flag switches from CPU to disk I/O metrics, displaying kilobytes read and written per second for each process. This is the fastest way to pinpoint which application is thrashing your storage during a high-iowait incident.
pidstat -d 2
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:35:22 PID kB_rd/s kB_wr/s kB_ccwr/s iodelay Command
14:35:24 1221 0.00 148.00 2.00 0 rsyslogd
14:35:24 2841 412.00 88.00 4.00 12 python3
14:35:24 3210 0.00 22.00 0.00 0 postgres
The kB_ccwr/s column shows cancelled write bytes—pages queued for disk but invalidated before flushing. High values suggest a process inefficiently writing and immediately overwriting the same data. The iodelay column counts clock ticks a process spent blocked waiting for I/O. A non-zero iodelay alongside mpstat‘s high %iowait confirms that specific process as the I/O culprit.
9. Show Per-Thread CPU Usage
The -t flag expands a process into its individual threads, which is crucial for multi-threaded applications where only one thread might be causing elevated CPU usage while others are idle.
pidstat -t -p 2841 2 3
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:37:08 UID TGID TID %usr %system %guest %wait %CPU CPU Command
14:37:10 1000 2841 - 18.00 0.88 0.00 0.00 18.88 0 python3
14:37:10 1000 - 2841 17.50 0.50 0.00 0.00 18.00 0 |python3
14:37:10 1000 - 2844 0.50 0.38 0.00 0.00 0.88 1 |ThreadPool-1
14:37:10 1000 - 2845 0.00 0.00 0.00 0.00 0.00 2 |__GC-thread
Replace 2841 with the PID of the process you’re inspecting (use pgrep <process_name> if needed). The pipe-and-underscore hierarchy immediately shows that the main thread is doing most of the work, while ThreadPool-1 and GC-thread are barely active.
10. Monitor Memory Utilization Per Process
The -r flag reports virtual size, resident set size, and page fault rates per process. The -h flag prints in a compact single-line format, omitting repetitive headers, which is easier for continuous monitoring.
pidstat -rh 2 3
Output:
# Time UID PID minflt/s majflt/s VSZ RSS %MEM Command
1746353842 1000 2841 3412.00 0.00 812448 312200 7.68 python3
1746353842 1000 3102 128.22 0.00 142312 44820 1.10 nginx
1746353842 0 1201 644.00 0.00 506728 316788 7.80 Xorg
The majflt/s column is critical: a major page fault means the kernel had to access disk to retrieve a memory page from swap. Sustained non-zero values here indicate active swapping and identify the process causing it. Minor faults (minflt/s) are less concerning, as they merely represent the kernel mapping pages already in RAM; major faults are expensive and severely degrade application response times.
11. Filter Processes by Name
The -G flag filters output to only processes whose command name matches a specified string, eliminating the need to scroll through a full process list to find your target application.
pidstat -G nginx
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:40:18 UID PID %usr %system %guest %wait %CPU CPU Command
14:40:18 1000 3102 0.12 0.04 0.00 0.00 0.16 2 nginx
14:40:18 1000 3103 0.08 0.02 0.00 0.00 0.10 3 nginx
Combine with -t to expand into threads (e.g., pidstat -t -G nginx), or add an interval for continuous monitoring (e.g., pidstat -G nginx 2). If -G returns nothing, the process name might not match exactly; verify the actual command string with ps aux | grep nginx.
12. Show Real-Time Scheduling Priority and Policy
The -R flag reports the scheduling policy and real-time priority for each process. This is particularly useful when a latency-sensitive application isn’t getting adequate CPU time despite low overall system utilization.
pidstat -R
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:41:33 UID PID prio policy Command
14:41:33 0 3 99 FIFO migration/0
14:41:33 0 5 99 FIFO migration/1
14:41:33 0 14 99 FIFO watchdog/0
14:41:33 1000 4821 20 NORMAL java
Kernel migration threads are expected to run at priority 99 with FIFO scheduling. If you observe a user process with FIFO or RR policy and a high priority, it will preempt everything else on that CPU until it voluntarily yields, explaining why other processes might be starved.
sar: System Activity Reporter for Historical Linux Performance
While mpstat and pidstat offer point-in-time views, sar is your historical Linux performance monitoring companion. It continuously collects system activity data via cron, allowing you to replay any time window from the past. This means you can investigate a performance problem that occurred at 3 AM without needing to be awake at 3 AM—a capability critical for effective Linux server troubleshooting.
13. Collect CPU Statistics and Save to a Binary Log
The -u flag reports CPU utilization, while -o saves raw data to a binary file for later replay. The 2 5 arguments collect 5 samples at 2-second intervals, providing a concise capture window useful during an incident.
sar -u -o /tmp/sarfile 2 5
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:44:10 CPU %user %nice %system %iowait %steal %idle
14:44:12 all 22.14 0.00 3.44 0.00 0.00 74.42
14:44:14 all 19.88 0.00 2.98 0.52 0.00 76.62
14:44:16 all 24.32 0.00 3.11 0.00 0.00 72.57
14:44:18 all 21.44 0.00 3.22 0.00 0.00 75.34
14:44:20 all 20.88 0.00 3.08 0.00 0.00 76.04
Average: all 21.73 0.00 3.17 0.10 0.00 75.00
To replay the saved file, run sar -u -f /tmp/sarfile. To convert it to CSV for graphing (e.g., in Grafana for advanced observability), use sadf -d /tmp/sarfile. Remember, the binary format is not human-readable directly; always process it via sar -f or sadf.
14. Set Up Automatic Collection with Cron
Instead of running sar interactively, configure it for continuous background data collection. This provides historical data to replay whenever an incident occurs, even if it already happened.
Add these two entries to root’s crontab using sudo crontab -e:
# Collect system activity every 10 minutes
/10 /usr/lib/sysstat/sa1 1 1
# Generate daily human-readable report at 23:53
53 23 /usr/lib/sysstat/sa2 -A
Verify data is being collected:
ls -lh /var/log/sysstat/
Output:
total 2.1M
-rw-r--r-- 1 root root 412K May 3 23:53 sa03
-rw-r--r-- 1 root root 388K May 4 14:45 sa04
-rw-r--r-- 1 root root 44K May 4 14:45 sar04
Note: On Ubuntu/Debian, the sa1 path is /usr/lib/sysstat/sa1. On RHEL/Rocky Linux, it might be /usr/lib64/sa/sa1. Confirm the correct path with which sa1 before adding the cron entry. If cron runs but /var/log/sysstat/ remains empty on Ubuntu, ensure ENABLED="true" is set in /etc/default/sysstat, as collection is disabled by default.
15. Check Run Queue and Load Average History
The -q flag reports run queue length, total process count, load averages, and blocked process count. Together, these metrics indicate whether your system has more runnable work than it can process, and whether the issue is CPU saturation or I/O blocking.
sar -q 2 5
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:50:11 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
14:50:13 3 512 2.44 1.88 1.42 0
14:50:15 4 514 2.51 1.90 1.43 1
14:50:17 5 516 2.60 1.91 1.43 2
14:50:19 2 514 2.48 1.90 1.43 0
14:50:21 1 512 2.41 1.89 1.43 0
Average: 3 514 2.49 1.90 1.43 1
On a 4-CPU system, a sustained runq-sz above 4 signifies processes actively queuing for CPU time rather than executing, pointing to CPU saturation. The blocked column counts processes blocked on disk or network I/O; a non-zero value here, coupled with high %iowait from sar -u, confirms a storage bottleneck rather than a CPU problem.
16. Check Filesystem Usage Over Time
The -F flag reports free and used space, plus inode usage for every mounted filesystem. Integrated into the sar historical log, you can use it to track when a filesystem began filling up, rather than merely knowing it’s full now.
sar -F 2 4
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:52:01 MBfsfree MBfsused %fsused %ufsused Ifree Iused %Iused FILESYSTEM
14:52:03 18240 9760 34.84 0.00 12188422 811578 6.24 /dev/sda1
14:52:03 2048 952 31.74 0.00 1048320 176480 14.40 /dev/sdb1
The %ufsused column (user-space percentage) accounts for reserved blocks (e.g., for root). If %fsused hits 100% but %ufsused is lower, regular users can no longer write, but root still can. A filesystem silently filling overnight is a common cause of application crashes that often look like software bugs until disk space is checked.
17. Monitor Network Interface Throughput
The -n DEV flag reports packets and bytes received and transmitted per second for each interface. Piping through grep -v lo removes the loopback interface, focusing the output on actual network traffic.
sar -n DEV 1 3 | grep -v lo
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:54:08 IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
14:54:09 eth0 412.00 388.00 42.14 38.22 0.00 0.00 0.00 0.40
14:54:10 eth0 844.22 812.44 88.22 82.14 0.00 0.00 0.00 0.84
14:54:11 eth0 388.12 366.00 40.44 38.88 0.00 0.00 0.00 0.40
The %ifutil column (Sysstat 11.5+) shows the percentage of interface bandwidth in use. Values above 80% on a 1Gbps interface suggest you’re nearing the wire limit, prompting investigation into traffic shaping, bonding, or a faster NIC. Combine this with sar -n EDEV to simultaneously check for packet errors and drops.
18. Report Block Device I/O Latency
The -d flag displays throughput and latency per block device. The combination of await and svctm is the most direct indicator of whether you have a slow disk or an overloaded I/O queue.
sar -d 1 3
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:56:14 DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
14:56:15 dev8-0 88.00 412.00 1084.00 17.00 0.88 10.02 4.12 36.22
14:56:16 dev8-0 112.44 488.22 1244.00 15.44 1.12 12.44 4.88 44.12
14:56:17 dev8-0 76.22 344.88 922.00 16.88 0.72 8.88 3.88 29.66
The await value represents the total time in milliseconds from when an I/O request was queued until it completed. svctm is the time the device actually spent serving the request. A significant gap between them indicates requests are spending most of their time waiting in the queue, rather than being limited by disk speed itself. The aqu-sz column confirms this by showing the average queue depth; consistently above 1 on a single spinning disk signals I/O saturation.
19. Report Memory Usage and Commit Statistics
The -r flag provides a comprehensive memory overview, including free, used, cached, and committed memory. The %commit column is particularly insightful, revealing whether the kernel has promised more memory than physically exists.
sar -r 1 3
Output:
Linux 6.8.0-45-generic (web01.tecmint.com) 05/04/2026 _x8664 (4 CPU)
14:58:22 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
14:58:23 2142440 5907560 73.40 288012 2844120 6112448 37.82 3188400 2241644 1224
14:58:24 2138800 5911200 73.45 288016 2844888 6114220 37.83 3190144 2241788 988
14:58:25 2136320 5913680 73.48 288020 2845200 6118400 37.86 3191220 2241900 812
Average: 2139187 5910813 73.44 288016 2844736 6114989 37.84 3189921 2241777 1008
When %commit exceeds 100%, the kernel has overcommitted virtual memory to processes, promising more than it can physically back with RAM and swap combined. This means some allocations will fail with out-of-memory errors if every process simultaneously tries to use its full committed allocation. The kbdirty column shows memory pages written but not yet flushed to disk; a persistently high value indicates a lot of data in memory that would be lost during a sudden power failure.
20. Export Historical sar Data as CSV
The sadf -d command reads a binary sar data file and outputs semicolon-delimited CSV. This allows you to easily open the data in a spreadsheet or pipe it into awk for custom analysis. It’s the best method for building detailed performance graphs from the data sar collects automatically, and can be integrated into modern observability dashboards like Grafana.
sadf -d /var/log/sysstat/sa04 -- -n DEV | grep -v lo
Output:
hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
web01;600;2026-05-04 08:00:02 UTC;eth0;288.44;241.22;29.14;24.08;0.00;0.00;0.00;0.28
web01;600;2026-05-04 08:10:02 UTC;eth0;312.88;288.44;31.82;29.04;0.00;0.00;0.00;0.32
web01;600;2026-05-04 09:00:02 UTC;eth0;1841.22;1724.88;188.44;174.12;0.00;0.00;0.00;1.84
The 09:00 row, showing 1841 packets per second versus 288 at 08:00, exemplifies the kind of traffic spike you’d entirely miss with only live sar output. This highlights why setting up the cron job (example 14) is invaluable before an incident occurs.
To save this to a file, simply redirect the output:
sadf -d /var/log/sysstat/sa04 -- -n DEV > /tmp/network-$(date +%F).csv
Now you have data ready to chart or share with your team for collaborative Linux performance monitoring and analysis.
Conclusion
The Sysstat suite excels as a diagnostic chain. A typical workflow for a sluggish system begins with sar -u to review recent CPU history. It then progresses to mpstat -P ALL to identify hot cores, moves to pidstat -p ALL 2 to name the responsible process, and concludes with pidstat -d or sar -d if I/O is involved.
The single most impactful step you can take right now to enhance your system administration tools is to set up the cron job from example 14. Allow it to collect a full week of baseline data before anything goes wrong. Diagnosing an incident with historical data often takes minutes, while doing so from scratch can take hours.
After the first day, run the following command to confirm data is flowing and the file is growing as expected:
sar -u -f /var/log/sysstat/sa$(date +%d)
What’s the most useful sar flag or pidstat option you leverage in production that didn’t make this list? Share your insights in the comments!
FAQ
Question 1: Why is Sysstat superior to top for diagnosing Linux performance issues?
Answer 1: While top offers an immediate snapshot of system activity, it lacks the historical context and granular detail crucial for complex troubleshooting. Sysstat’s mpstat, pidstat, and sar provide per-core CPU usage, per-process resource consumption over time (including I/O and memory), and invaluable historical data logging. This allows you to identify transient spikes, correlate different system metrics, and pinpoint root causes that a single point-in-time view simply cannot reveal. It’s an indispensable system administration tool for deep dives.
Question 2: How can I ensure Sysstat automatically collects historical performance data?
Answer 2: The most critical step is configuring the sa1 and sa2 cron jobs. By adding entries to your root crontab (e.g., /usr/lib/sysstat/sa1 1 1 every 10 minutes and /usr/lib/sysstat/sa2 -A daily), Sysstat will continuously log system activity to binary files in /var/log/sysstat/. This proactive setup is a cornerstone of effective Linux server troubleshooting, enabling you to analyze incidents that occurred hours or days ago without active real-time monitoring. Remember to also enable data collection if you’re on Ubuntu/Debian by setting ENABLED="true" in /etc/default/sysstat.
Question 3: What’s a common Sysstat workflow for diagnosing high I/O wait or disk bottlenecks?
Answer 3: A typical workflow begins with sar -u to confirm general CPU usage and identify high %iowait values. If iowait is significant, proceed to pidstat -d to pinpoint which specific processes are generating heavy disk I/O (look at kB_rd/s, kB_wr/s, and iodelay). For deeper block device analysis, use sar -d to examine await, svctm, and aqu-sz for individual disks, determining if the bottleneck is due to slow disk performance or an overloaded I/O queue. This methodical approach is vital for effective Linux performance monitoring and resolving storage-related issues.

