1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
9
<title>Monitoring Routers and Switches</title>
13
<STYLE type="text/css">
17
.Default { font-family: verdana,arial,serif; font-size: 8pt; }
19
.PageTitle { font-family: verdana,arial,serif; font-size: 16pt; font-weight: bold; }
31
<body bgcolor="#FFFFFF" text="black" class="Default">
37
<img src="images/nagios.jpg" border="0" alt="Nagios" title="Nagios">
39
<h1 class="PageTitle">Monitoring Routers and Switches</h1>
51
<img src="images/upto.gif" border="0" align="middle" alt="Up To" title="Up To">Up To: <a href="toc.html">Contents</a><br>
53
<img src="images/seealso.gif" border="0" align="middle" alt="See Also" title="See Also"> See Also: <a href="monitoring-publicservices.html">Monitoring Publicly Available Services</a>
61
<strong><u>Introduction</u></strong>
67
<img src="images/switch.png" border="0" style="float: right" alt="Switch">
73
This document describes how you can monitor the status of network switches and routers. Some cheaper "unmanaged" switches and hubs don't have IP addresses and are essentially invisible on your network, so there's not any way to monitor them. More expensive switches and routers have addresses assigned to them and can be monitored by pinging them or using SNMP to query status information.
81
I'll describe how you can monitor the following things on managed switches, hubs, and routers:
89
<li>Packet loss, round trip average</li>
91
<li>SNMP status information</li>
93
<li>Bandwidth / traffic rate</li>
101
<img src="images/note.gif" border="0" align="bottom" alt="Note" title="Note"> Note: These instructions assume that you've installed Nagios according to the <a href="quickstart.html">quickstart guide</a>. The sample configuration entries below reference objects that are defined in the sample config files (<i>commands.cfg</i>, <i>templates.cfg</i>, etc.) that are installed when you follow the quickstart.
109
<strong><u>Overview</u></strong>
115
<img src="images/monitoring-routers.png" border="0" alt="Monitoring a Router or Switch" title="Monitoring a Router or Switch" style="float: right;">
121
Monitoring switches and routers can either be easy or more involved - depending on what equipment you have and what you want to monitor. As they are critical infrastructure components, you'll no doubt want to monitor them in at least some basic manner.
129
Switches and routers can be monitored easily by "pinging" them to determine packet loss, RTA, etc. If your switch supports SNMP, you can monitor port status, etc. with the <i>check_snmp</i> plugin and bandwidth (if you're using MRTG) with the <i>check_mrtgtraf</i> plugin.
137
The <i>check_snmp</i> plugin will only get compiled and installed if you have the net-snmp and net-snmp-utils packages installed on your system. Make sure the plugin exists in <i>/usr/local/nagios/libexec</i> before you continue. If it doesn't, install net-snmp and net-snmp-utils and recompile/reinstall the Nagios plugins.
145
<strong><u>Steps</u></strong>
153
There are several steps you'll need to follow in order to monitor a new router or switch. They are:
161
<li>Perform first-time prerequisites</li>
163
<li>Create new host and service definitions for monitoring the device</li>
165
<li>Restart the Nagios daemon</li>
175
<strong><u>What's Already Done For You</u></strong>
183
To make your life a bit easier, a few configuration tasks have already been done for you:
191
<li>Two command definitions (<i>check_snmp</i> and <i>check_local_mrtgtraf</i>) have been added to the <i>commands.cfg</i> file. These allows you to use the <i>check_snmp</i> and <i>check_mrtgtraf</i> plugins to monitor network routers.</li>
193
<li>A switch host template (called <i>generic-switch</i>) has already been created in the <i>templates.cfg</i> file. This allows you to add new router/switch host definitions in a simple manner.</li>
201
The above-mentioned config files can be found in the <i>/usr/local/nagios/etc/objects/</i> directory. You can modify the definitions in these and other definitions to suit your needs better if you'd like. However, I'd recommend waiting until you're more familiar with configuring Nagios before doing so. For the time being, just follow the directions outlined below and you'll be monitoring your network routers/switches in no time.
209
<strong><u>Prerequisites</u></strong>
217
The first time you configure Nagios to monitor a network switch, you'll need to do a bit of extra work. Remember, you only need to do this for the *first* switch you monitor.
225
Edit the main Nagios config file.
233
vi /usr/local/nagios/etc/nagios.cfg
241
Remove the leading pound (#) sign from the following line in the main configuration file:
249
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg
257
Save the file and exit.
265
What did you just do? You told Nagios to look to the <i>/usr/local/nagios/etc/objects/switch.cfg</i> to find additional object definitions. That's where you'll be adding host and service definitions for routers and switches. That configuration file already contains some sample host, hostgroup, and service definitions. For the *first* router/switch you monitor, you can simply modify the sample host and service definitions in that file, rather than creating new ones.
275
<strong><u>Configuring Nagios</u></strong>
283
You'll need to create some <a href="objectdefinitions.html">object definitions</a> in order to monitor a new router/switch. </p>
289
Open the <i>switch.cfg</i> file for editing.
297
vi /usr/local/nagios/etc/objects/switch.cfg
305
Add a new <a href="objectdefinitions.html#host">host</a> definition for the switch that you're going to monitor. If this is the *first* switch you're monitoring, you can simply modify the sample host definition in <i>switch.cfg</i>. Change the <i>host_name</i>, <i>alias</i>, and <i>address</i> fields to appropriate values for the switch.
315
use generic-switch ; Inherit default values from a template
317
host_name linksys-srw224p ; The name we're giving to this switch
319
alias Linksys SRW224P Switch ; A longer name associated with the switch
321
address 192.168.1.253 ; IP address of the switch
323
hostgroups allhosts,switches ; Host groups this switch is associated with
333
<strong><u>Monitoring Services</u></strong>
341
Now you can add some service definitions (to the same configuration file) to monitor different aspects of the switch. If this is the *first* switch you're monitoring, you can simply modify the sample service definition in <i>switch.cfg</i>.
349
<img src="images/note.gif" border="0" align="bottom" alt="Note" title="Note"> Note: Replace "<i>linksys-srw224p</i>" in the example definitions below with the name you specified in the <i>host_name</i> directive of the host definition you just added.
359
<strong><u>Monitoring Packet Loss and RTA</u></strong>
367
Add the following service definition in order to monitor packet loss and round trip average between the Nagios host and the switch every 5 minutes under normal conditions.
377
use generic-service ; Inherit values from a template
379
host_name linksys-srw224p ; The name of the host the service is associated with
381
service_description PING ; The service description
383
check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service
385
normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
387
retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined
397
This service will be:
405
<li>CRITICAL if the round trip average (RTA) is greater than 600 milliseconds or the packet loss is 60% or more</li>
407
<li>WARNING if the RTA is greater than 200 ms or the packet loss is 20% or more</li>
409
<li>OK if the RTA is less than 200 ms and the packet loss is less than 20%</li>
417
<strong><u>Monitoring SNMP Status Information</u></strong>
425
If your switch or router supports SNMP, you can monitor a lot of information by using the <i>check_snmp</i> plugin. If it doesn't, skip this section.
433
Add the following service definition to monitor the uptime of the switch.
443
use generic-service ; Inherit values from a template
445
host_name linksys-srw224p
447
service_description Uptime
449
check_command check_snmp!-C public -o sysUpTime.0
459
In the <i>check_command</i> directive of the service definition above, the "-C public" tells the plugin that the SNMP community name to be used is "public" and the "-o sysUpTime.0" indicates which OID should be checked.
467
If you want to ensure that a specific port/interface on the switch is in an up state, you could add a service definition like this:
477
use generic-service ; Inherit values from a template
479
host_name linksys-srw224p
481
service_description Port 1 Link Status
483
check_command check_snmp!-C public -o ifOperStatus.1 -r 1 -m RFC1213-MIB
493
In the example above, the "-o ifOperStatus.1" refers to the OID for the operational status of port 1 on the switch. The "-r 1" option tells the <i>check_snmp</i> plugin to return an OK state if "1" is found in the SNMP result (1 indicates an "up" state on the port) and CRITICAL if it isn't found. The "-m RFC1213-MIB" is optional and tells the <i>check_snmp</i> plugin to only load the "RFC1213-MIB" instead of every single MIB that's installed on your system, which can help speed things up.
501
That's it for the SNMP monitoring example. There are a million things that can be monitored via SNMP, so its up to you to decide what you need and want to monitor. Good luck!
509
<img src="images/tip.gif" border="0" align="bottom" alt="Tip" title="Tip"> Tip: You can usually find the OIDs that can be monitored on a switch by running the following command (replace <i>192.168.1.253</i> with the IP address of the switch):
511
<i>snmpwalk -v1 -c public 192.168.1.253 -m ALL .1</i>
519
<strong><u>Monitoring Bandwidth / Traffic Rate</u></strong>
527
If you're monitoring bandwidth usage on your switches or routers using <a href="http://oss.oetiker.ch/mrtg/">MRTG</a>, you can have Nagios alert you when traffic rates exceed thresholds you specify. The <i>check_mrtgtraf</i> plugin (which is included in the Nagios plugins distribution) allows you to do this.
535
You'll need to let the <i>check_mrtgtraf</i> plugin know what log file the MRTG data is being stored in, along with thresholds, etc. In my example, I'm monitoring one of the ports on a Linksys switch. The MRTG log file is stored in <i>/var/lib/mrtg/192.168.1.253_1.log</i>. Here's the service definition I use to monitor the bandwidth data that's stored in the log file...
545
use generic-service ; Inherit values from a template
547
host_name linksys-srw224p
549
service_description Port 1 Bandwidth Usage
551
check_command check_local_mrtgtraf!/var/lib/mrtg/192.168.1.253_1.log!AVG!1000000,2000000!5000000,5000000!10
561
In the example above, the "/var/lib/mrtg/192.168.1.253_1.log" option that gets passed to the <i>check_local_mrtgtraf</i> command tells the plugin which MRTG log file to read from. The "AVG" option tells it that it should use average bandwidth statistics. The "1000000,2000000" options are the warning thresholds (in bytes) for incoming traffic rates. The "5000000,5000000" are critical thresholds (in bytes) for outgoing traffic rates. The "10" option causes the plugin to return a CRITICAL state if the MRTG log file is older than 10 minutes (it should be updated every 5 minutes).
579
<strong><u>Restarting Nagios</u></strong>
587
Once you've added the new host and service definitions to the <i>switch.cfg</i> file, you're ready to start monitoring the router/switch. To do this, you'll need to <a href="verifyconfig.html">verify your configuration</a> and <a href="startstop.html">restart Nagios</a>.
595
If the verification process produces any errors messages, fix your configuration file before continuing. Make sure that you don't (re)start Nagios until the verification process completes without any errors!