I recently met with an international telephone company who is using an open source monitoring tool called Nagios to keep tabs on their international VoIP network. Having reviewed several small network monitoring options, including Servers Alive and Jumpnode’s Pulse product, I decided to dive into Nagios to see how feasible it is to deploy this full featured network monitoring tool for the small network environment.
Before you read any further, let me state that the goal of this article is to introduce Nagios and cover some of its basics. The Nagios team has done a pretty good job with built-in documentation, so here I’ll provide observations and highlights.
To start, Nagios is software that runs on Linux. But instead of running a separate physical Linux PC, I used my Windows XP Pro PC with VMWare’s free Server software. Using VMWare’s virtualization technology, I created a Linux Virtual Machine (VM) using the Ubuntu distribution. Check out this article where I covered how to install Ubuntu’s Linux as a VM.
Although Nagios’ documentation is pretty thorough, a bit of Linux familiarity comes in handy to get this product running. Skills such as being comfortable working at the command line and the ability to use a text editor like vi are needed.
I followed Nagios’ quick start guide for the Ubuntu distribution to the letter. (There are also guides for Fedora and Suse Linux distros.) Nagios installation has multiple steps, including installing the Apache web server and several other tools (all free) on your Linux machine, which the guide walks you through. I simply copied the commands from Nagios’ website and pasted them to the command line on my Linux machine.
I took my time and scribbled notes as I was going, so it probably took me longer than necessary. Nagios claims you can get the basic software installed and running in 15 minutes, which seems to be a reasonable claim based on my experience.
Completing the quick start guide successfully gets the Nagios software running, and allows you to log into your Nagios server with a browser. My Linux VM has an IP of 192.168.3.162, so I could validate Nagios was running by browsing from my laptop to http://192.168.3.162/nagios, shown in Figure 1.
Figure 1: Nagios Home screen
Notice the arrow I added to the above picture in the top left. A full set of Nagios documentation is available once you’ve completed the basic installation. Clicking the documentation link presents a table of contents, partially listed in Figure 2, rich with useful guides on how to configure various Nagios options. I used many of these guides to add devices and enable monitoring services, and found them relatively easy to follow.
Figure 2: Nagios Documentation index
With the Nagios software installed, each host on your network has to be defined in a Nagios configuration file by specifying its name, description (alias), IP address, as well as group membership and other options, such as network parent-child relationships. For example, here are the configuration lines I added to the switch.cfg file to define one of my network routers.
define host{
use generic-switch ;
host_name FVS336G ; Name
alias NETGEAR ROUTER ; Description
address 192.168.3.1 ; IP address
hostgroups routers ; Host groups
}
There are multiple Nagios configuration files in the /usr/local/nagios/etc/objects directory where host definitions can be stored, such as a file for switches (which is also used to define routers), Windows servers, and network printers. Additional configuration files can be added as long as that file has been identified to Nagios in the /usr/local/nagios/etc/nagios.cfg file. Note that a host can be created in any file, although it makes the most sense to use the appropriately named file for each type of device, such as using the windows.cfg file for defining Windows machine monitoring.
Once a device is defined, additional statements need to be added to Nagios configuration files to run monitoring services on a host such as ping, snmp, port status, or performance monitoring like CPU, memory and disk utilization. Monitoring a Windows or Linux server also requires installing client software on that server to interact with Nagios, which is also well documented.
I spent a couple of hours playing with the various configurations and entering devices on my network, with the end result of 11 network devices monitored using 30 monitoring services, as you can see in Figure 3. Each of the devices and services are displayed as hyperlinks which take you to more details on status and what is being monitored.
Figure 3: Nagios Service Overview
Notice my Windows machine has a critical alarm shown in the bottom left. I configured Nagios to monitor both my C: and D: hard drives. The default parameters generate an alarm if a disk drive is over 90% full, which is exactly the case on my D: drive.
Once I installed Nagios, the documentation links installed locally on my Linux machine provided all the guidance I needed to get up and running, with two exceptions.
I was getting Nagios’ 127 error code with snmp enabled for several of my devices and found I needed to install the Linux SNMP packages. So I installed the SNMP tools via the Linux GUI, clicking on System->Administration->Synaptic Package Manager, searching for "snmp" and selecting both snmp and snmpd as shown in Figure 4. With these packages running on my Linux VM, I was able to successfully utilize the simple network management protocol for more detailed monitoring than a basic ping.
Figure 4: Installing SNMP
The other "gotcha" I ran into was enabling Windows machine monitoring. Nagios’ client software (NSClient++) has to be installed on each Windows machine that you want to monitor. There is both a compressed zip file, as well as an executable (.msi) version; I’d recommend using the executable version, it’s easier to install.
The other Windows issue is the need to open a port on the Windows firewall so it could be monitored via Nagios. Nagios documentation does mention that port 12489 is used for communication between the client on the Windows server and the Nagios monitoring server, but it took me a while to realize I had to open that port on the Windows firewall (duh!).
There are various status map views available in the default configuration, including the Tree view shown in Figure 5. Notice the information window at the top left, showing details about my D-Link NetDefend router, including its IP address and other status details. Hovering over any of the circled "?" icons in this status map triggers displaying that device’s information.
Figure 5: Tree view
This was a fun project, with the end result having a full-blown network monitoring system running on my small network. Servers Alive and Jumpnode’s product are good network monitoring tools, but Servers Alive is free for only 10 devices and Jumpnode doesn’t have a free version.
I’ve just scratched the surface on this tool. Nagios, being open source, can be modified to link to various network systems, as well as provide a wide array of notifications of network conditions. The international VoIP organization I mentioned in the opening is running multiple clustered Nagios servers to provide redundant coverage of their global network.
With less than a dozen devices to monitor on my small network, I found Nagios to be a useful solution that will grow with my network and was worth the effort to install. More importantly, it is a highly flexible solution with a very attractive price!