Xaprb

Stay curious!

How to monitor server load on GNU/Linux

with 21 comments

This article introduces six methods and 12 tools for monitoring system load, performance and related information on GNU/Linux and similar systems. I’ve seen many articles that mention one or two of these tools, but none that discusses and compares all the ones I find useful.

Gkrellm

Gkrellm is the choice of the “g33k” types. It’s a graphical program that monitors all sorts of statistics and displays them as numbers and charts. You can see examples of it in use on nearly every GNU/Linux screenshot website. It is very flexible and capable, and can monitor useful as well as ridiculous things via plugins. It can monitor the status of a remote system, since it’s a client/server system.

The downsides, in my opinion, are

  1. the impact on the monitored system’s performance (sometimes significant)
  2. the flashiness and eye candy make it seem more meaningful than it might be
  3. it’s graphical, needs to run as a daemon, and isn’t installed by default, so it’s not optimal for monitoring a server

“Task Manager” clones

gnome-system-monitor is a graphical program installed as part of the base Gnome system. It is somewhat similar to the Task Manager in Microsoft Windows. It isn’t very full-featured, with only three tabs (Processes, Resources, Devices). The Devices tab just shows devices, Resources shows the history of CPU, memory, swap and network usage, and the Processes tab shows the processes. The Processes tab is the only one that really lets the user “do” anything, such as killing or re-nicing processes, or showing their memory maps.

Of course, this tool is only available on systems with Gnome installed, and requires an X server to be running. This makes it impractical for use on a server.

I know there’s a similar tool on KDE systems, but I don’t have one handy to examine at the moment.

vmstat and related tools

vmstat is part of the base installation on most GNU/Linux systems. By default, it displays information about virtual memory, CPU usage, I/O, processes, and swap, and can print information about disks and more. It runs in a console. I find the command vmstat -n 5 very helpful for printing a running status display in a tabular format.

It’s great for figuring out how heavily loaded a system truly is, and what the problem (if any) is. For example, when I see a high number in the rightmost column (percent of CPU time spent waiting for I/O) on a database server, I know the system is I/O-bound.

iostat is part of the sysstat package on Gentoo, as are mpstat and sar. iostat prints similar statistics as vmstat, but gives more detail on specific devices and is geared toward understanding I/O usage in more detail than vmstat is. mpstat is a similar tool that prints processor statistics, and is multi-processor aware. sar collects, reports, and saves system activity information (for example, for later analysis).

All of these tools are very flexible and customizable. The user can choose what information to see and what format to see it in. These tools are not usually installed by default, except for vmstat.

top

top is the classic tool for monitoring any UNIX-like system. It runs in a terminal and refreshes at intervals, displaying a list of processes in a tabular format. Each column is something like virtual memory size, processor usage, and so forth. It is highly customizable and has some interactive features, such as re-nicing or killing processes. Since it’s the most widely known of the tools in this article, I won’t go into much detail, other than to say there’s a lot to know about it — read the man page.

top is one of the programs in the procps package, along with ps, vmstat, w, kill, free, slabtop, and skill. All these tools are in a default installation on most distributions.

htop is similar to top, except it is mouse-aware, has a color display, and displays little charts to help see statistics at a glance. It also has some features top doesn’t have.

On a somwhat-related note, mytop is a handy monitor for MySQL servers. Take a look at Jeremy Zawodny’s website while you’re there. He is a smart cookie.

tload

tload runs in a terminal and displays a text-only “graph” of current system load averages, garnered from /proc/loadavg. It is part of the base installation on most GNU/Linux systems. I find it extremely useful for watching a system’s performance over SSH, often within a GNU Screen session.

My favorite technique is to start a terminal, connect over SSH, resize the terminal to 150×80 or so, then start tload and shrink the window by CTRL-right-clicking and selecting “Unreadable” as the font size. The result looks like the following:

Server load diagram

I then set the terminal window as always-on-top and move it to a corner of my screen, where it prints a pretty little graph as time goes by.

The only trouble is, it’s not really obvious what the graph means. The man page isn’t terribly helpful; it just says tload gets its numbers from the /proc/loadavg file, and there’s no man page for that file. I looked in the kernel source for the answer.

Documentation/filesystems/proc.txt says loadavg is “Load average of last 1, 5 & 15 minutes,” but not how it’s calculated. Poking around in source/fs/proc/proc_misc.c and kernel/timer.c reveals the origin of the numbers: the number of running and uninterruptible processes (see http://lxr.linux.no/source/kernel/timer.c#L832).

watch

watch isn’t really a load-monitoring tool, but it’s beastly handy because it takes any command as input and monitors the result of running that command. For example, if I wanted to monitor when the “foozle” program is executing, I could run

watch --interval=5 "ps aux | grep foozle | grep -v xaprb"

Summary

I’ve given an overview of lots of tools above. Each has its use. I’m not a big fan of graphical tools, and they’re not very practical for monitoring servers anyway. Therefore, I lean towards running tload over SSH to monitor systems, and use vmstat, iostat and friends to troubleshoot specific problems.

Do you have any favorite programs for monitoring and troubleshooting GNU/Linux systems that should be on this list? Leave a response!

Written by Xaprb

June 8th, 2006 at 8:27 pm

Posted in GNU/Linux

21 Responses to 'How to monitor server load on GNU/Linux'

Subscribe to comments with RSS or TrackBack to 'How to monitor server load on GNU/Linux'.

  1. Another tool I forgot to mention is lsof, which lists open files. Don’t be fooled by how simple that sounds! It’s tremendously powerful. Do some Google searches and you can find pages that give examples of how to figure out things you’d never think are possible to know just by looking at open files. Indeed, some of these things I can’t even think how else to do.

    Xaprb

    27 Jul 06 at 6:14 pm

  2. Excellent article – thanks a lot!

    Any ideas on how to best get an idea of network load from the terminal? It would be neat with a top-like terminal application to get an idea of how much bandwidth my servers are using at any moment.

    Thanks again,

    -A.

  3. “Any ideas on how to best get an idea of network load from the terminal?”

    You can use a product called ‘iftop’ which displays stats on interface (the ‘if’ in iftop) and it will show which hosts are using the most bandwidth to/from your host.

    Jon

    8 Dec 06 at 11:08 pm

  4. Hi..I can’t get tload to look like your image. I tried scaling,
    but I couldn’t get it to work. I’m on FC4. Also, CTRL-right
    click does nothing to the window. What am I missing?

    Thanks

    Tom

    2 Mar 07 at 11:10 am

  5. You’re probably not using xterm; you’re probably using gnome-terminal or similar. Try explicitly running xterm and see if that behaves as you want.

    Xaprb

    2 Mar 07 at 11:17 am

  6. Thanks a lot!!!!!!!

    Rashmi

    20 Mar 07 at 7:03 am

  7. My personal preference when monitoring remote is the Dstat tool. It shows a continuous status each second.

    Gobo

    1 Apr 07 at 2:44 pm

  8. [...] alcuni dei sito da cui ho preso spunto. [...]

  9. [...] alcuni dei sito da cui ho preso [...]

  10. [...] View Post here [...]

  11. Some other handy tools for me are:
    ethstats – shows throughput of each ethernet card on console line (deb pkg: ethstats)

    iptraf – More indepth throughput monitor, shows packet/data flow to each host currently conntect

    netstat – another handy tool to see whats connected (-p shows pid of connection, -t tcp only, -n numeric) man for the rest.

    Chris

    28 Oct 07 at 8:44 pm

  12. Give me a simple understandable sample calcualtion of load average for unix machine.

    Balan

    14 May 08 at 10:34 pm

  13. Hi,

    There’s a super super old tool called xmeter that is great for this kind of thing. The output is a little like tload, but it monitors remote hosts using rstat (which could be a show stopper, depending on your environment). You can set thresholds in it, which I like, so the graph will change from, say, green to yellow to red as the load climbs through the thresholds. It *was* open sourced a long time ago, but it’s hard to find, so you can get it from my linuxlaboratory.org site under ‘Downloads’.

    Brian K. Jones

    27 Nov 08 at 9:56 pm

  14. If you use vmstat, you can build the charts
    from the vmstat log files using :
    http://www.michenux.net/vmstax-build-vmstat-charts-1.html

    You upload the vmstat log files and you can export
    the charts as image

    Michenux

    17 Apr 09 at 6:31 am

  15. [...] How to monitor server load on GNU/Linux – File under “stuff I should know more about.” [...]

  16. Thanks for this list. But, If you have over 10 servers you can’t monitor only via command line, because it is time-consuming.

    For multiple servers exists a lot of webscripts that could do this. Most of them you can find here:
    http://www.sysadmin.md/web-based-tools-to-monitor-your-servers.html

    Michael

    6 Jun 09 at 12:17 am

  17. Hola, es compatible con Plesk =? Tengo un servidor con plesk se le puede poner =)? Gracias.

    Martin

    20 Jun 09 at 12:00 pm

  18. You didn’t need to read the source for tload. It displays the same info as ‘uptime’. The man page for uptime says:

    System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state
    is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.

    Tim

    23 Jun 09 at 10:11 am

  19. [...] Monitoring server load on Linux [...]

  20. Excellent Article. Keep up the good work. Just wanted to say: Thank You! ;)

    Yavor

    18 Jul 09 at 1:06 pm

  21. [...] Pokud n?kdo preferuje názornou ukázku, jak se nap?. m?ní hodnoty zmín?nejch t?í parametr? on-the-fly, nech? mu k tomu dopom?že p?íkaz tload, kterej umí malovat i ASCII graf (popis, mimo dalších pom?cek pro sledování výkonu, nap?. na této stránce). [...]

Leave a Reply