Diagnosing Disk I/O issues: swapping, high IO wait, congestion

One one small LEMP VPS I manage, I noticed munin graphs that showed anywhere between 5-50 MB/second of disk IO. Since the VM has an SSD instead of traditional spinning hard drive, performance wasn't too bad, but all that disk I/O definitely slowed things down.

I wanted to figure out what was the source of all the disk I/O, so I used the following techniques to narrow down the culprit (spoilers: it was MySQL, which was using some swap space because it was tuned to use a little too much memory).

iotop

First up was iotop, a handy top-like utility for monitoring disk IO in real-time. Install it via yum or apt, then run it with the command sudo iotop -ao to see an aggregated summary of disk IO over the course of the utility's run. I let it sit for a few minutes, then checked back in to find:

<br />
Total DISK READ: 3.94 K/s | Total DISK WRITE: 23.66 K/s<br />
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO&gt;    COMMAND<br />
  755 be/4 mysql         0.00 B      0.00 B  0.00 %  4.10 % mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us~r/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock<br />
  756 be/4 mysql         0.00 B      0.00 B  0.00 %  4.08 % mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us~r/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock<br />
  757 be/4 mysql         0.00 B      0.00 B  0.00 %  3.81 % mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us~r/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock<br />
  758 be/4 mysql         0.00 B      0.00 B  0.00 %  2.81 % mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us~r/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock<br />
   29 be/4 root          0.00 B      0.00 B  0.00 %  0.33 % [kswapd0]<br />
  763 be/4 mysql       448.00 K     21.50 M  0.00 %  0.12 % mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us~r/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock<br />
  250 be/3 root          0.00 B    248.00 K  0.00 %  0.08 % [jbd2/vda-8]<br />
  601 be/4 root         16.00 K     66.03 M  0.00 %  0.04 % [flush-252:0]<br />
19924 be/4 root          2.42 M      0.00 B  0.00 %  0.01 % python /usr/sbin/iotop -ao<br />
19369 be/4 nginx       740.00 K      0.00 B  0.00 %  0.01 % php-fpm: pool www<br />
...<br />

top swap

Next I just wanted to see how much swap space was being consumed by which processes, so I launched top (with top), then pressed O (capital O), then p to show all the processes ranked by swap usage:

<br />
top - 13:10:15 up 10 days,  2:43,  1 user,  load average: 0.18, 0.17, 0.17<br />
Tasks:  80 total,   1 running,  79 sleeping,   0 stopped,   0 zombie<br />
Cpu(s): 62.0%us,  5.3%sy,  0.0%ni, 31.0%id,  0.3%wa,  0.0%hi,  0.3%si,  1.0%st<br />
Mem:    502260k total,   398908k used,   103352k free,     2492k buffers<br />
Swap:   524280k total,   130928k used,   393352k free,    49184k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND
747 mysql 20 0 962m 153m 3148 S 53.8 31.3 992:52.91 90m mysqld
28448 root 20 0 139m 1868 764 S 0.0 0.4 0:12.22 5444 munin-node
25287 nginx 20 0 49516 1456 696 S 0.0 0.3 0:50.94 4532 nginx

It was easy to quickly figure out the source of all the IO using iotop and top, and after tweaking the MySQL configuration to use half the memory, all the processes now sit happily in RAM, avoiding all the swapping overhead! The difference wasn't too great, but it did result in 10-15% faster page loads (on average) for authenticated users on a Drupal site running on the VPS.