Use nagios to monitor services

Use Nagios to keep tabs on your network.

Since remote exploits can often crash the service that is being broken into or cause its CPU use to skyrocket, you should monitor the services that are running on your network. Just looking for an open port (such as by using Nmap [Hack #42]) isn’t enough. The machine may be able to respond to a TCP connect request, but the service may be unable to respond (or worse, could be replaced by a different program entirely!). One tool that can help you verify your services at a glance is Nagios (http://www.nagios.org).

Nagios is a network-monitoring application that monitors not only the services running on the hosts on your network, but also the resources on each host, such as CPU usage, disk space, memory usage, running processes, log files, and much more. In the advent of a problem it can notify you through email, pager, or any other method that you define, and you can check the status of your network at a glace by using the web GUI. Nagios is also easily extensible through its plug-in API.

To install Nagios, download the source distribution from the Nagios web site. Then, unpack the source distribution and go into the directory it creates:

$ tar xfz nagios-1.1.tar.gz

$ cd nagios-1.1

Before running Nagios’s configure script, you should create a user and group for Nagios to run as (e.g., nagios). Then run the configure script with a command similar to this:

$ ./configure –with-nagios-user=nagios –with-nagios-grp=nagios

This will install Nagios in /usr/local/nagios. As usual, you can modify this behavior by using the –prefix switch. After the configure script finishes, compile Nagios by running make all. Then become root and run make install to install it. In addition, you can optionally install Nagios’s initialization scripts by running make install-init.

If you take a look into the /usr/local/nagios directory right now, you will see that there are four directories. The bin directory contains a single file, nagios, that is the core of the package. This application does the actual monitoring. The sbin directory contains the CGI scripts that will be used in the web-based interface. Inside the share directory, you’ll find the HTML files and documentation. Finally, the var directory is where Nagios will store its information once it starts running.

Before you can use Nagios, you will need a couple of configuration files. These files go into the etc directory, which will be created when you run make install-config. This command also creates a sample copy of each required configuration file and puts them into the etc directory.

At this point the Nagios installation is complete. However, it is not very useful in its current state, because it lacks the actual monitoring applications. These applications, which check whether a particular monitored service is functioning properly, are called plug-ins. Nagios comes with a default set of plug-ins, but they must be downloaded and installed separately.

Download the latest Nagios Plugins package and decompress it. You will need to run the provided configure script to prepare the package for compilation on your system. You will find that the plug-ins are installed in a fashion similar to the actual Nagios program.

To compile the plug-ins, run commands similar to these:

$ ./configure –prefix=/usr/local/nagios \

–with-nagios-user=nagios –with-nagis-grp=nagios

$ make all

You might get notifications about missing programs or Perl modules while the script is running. These are mostly fine, unless you specifically need the mentioned applications to monitor a service.

After compilation is finished, become root and run make install to install the plug-ins. The plug-ins will be installed in the libexec directory of your Nagios base directory (e.g., /usr/local/nagios/libexec).

There are a few rules that all Nagios plug-ins should implement, making them suitable for use by Nagios. All plug-ins provide a –help option that displays information about the plug-in and how it works. This feature is very helpful when you’re trying to monitor a new service using a plug-in you haven’t used before.

For instance, to learn how the check_ssh plug-in works, run the following command:

$ /usr/local/nagios/libexec/check_ssh

check_ssh (nagios-plugins 1.4.0alpha1) 1.13

The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute

copies of the plugins under the terms of the GNU General Public License.

For more information about these matters, see the file named COPYING.

Copyright (c) 1999 Remi Paulmier <remi@sinfomic.fr>

Copyright (c) 2000-2003 Nagios Plugin Development Team

<nagiosplug-devel@lists.sourceforge.net>

Try to connect to SSH server at specified server and port

Usage: check_ssh [-46] [-t <timeout>] [-p <port>] <host>

check_ssh (-h | –help) for detailed help

check_ssh (-V | –version) for version information

Options:

-h, –help

Print detailed help screen

-V, –version

Print version information

-H, –hostname=ADDRESS

Host name or IP Address

-p, –port=INTEGER

Port number (default: 22)

-4, –use-ipv4

Use IPv4 connection

-6, –use-ipv6

Use IPv6 connection

-t, –timeout=INTEGER

Seconds before connection times out (default: 10)

-v, –verbose

Show details for command-line debugging (Nagios may truncate output)

Send email to nagios-users@lists.sourceforge.net if you have questions

regarding use of this software. To submit patches or suggest improvements,

send email to nagiosplug-devel@lists.sourceforge.net

Now that both Nagios and the plug-ins are installed, we are almost ready to begin monitoring our servers. However, Nagios will not even start before it’s configured properly.

The sample configuration files provide a good starting point:

$ cd /usr/local/nagios/etc
$ ls -1

cgi.cfg-sample

checkcommands.cfg-sample

contactgroups.cfg-sample

contacts.cfg-sample

dependencies.cfg-sample

escalations.cfg-sample

hostgroups.cfg-sample

hosts.cfg-sample

misccommands.cfg-sample

nagios.cfg-sample

resource.cfg-sample

services.cfg-sample

timeperiods.cfg-sample

Since these are sample files, the Nagios authors added a .cfg-sample suffix to each file. First, we need to copy or rename each one to end in .cfg, so that the software can use them properly. (If you don’t change the configuration filenames, Nagios will not be able to find them.)

You can either rename each file manually or use the following command to take care of them all at once. Type the following script on a single line:

# for i in *cfg-sample; do mv $i `echo $i | \

sed -e s/cfg-sample/cfg/`; done;

First there is the main configuration file, nagios.cfg. You can pretty much leave everything as is—the Nagios installation process will make sure the file paths used in the configuration file are correct. There’s one option, however, that you might want to change: check_external_commands, which is set to 0 by default. If you would like to be able to directly run commands through the web interface, you will want to set this to 1. Depending on your network environment, this may or may not be an acceptable security risk, as enabling this option will permit the execution of scripts from the web interface. Other options you need to set in cgi.cfg configure which usernames are allowed to run external commands.

To get Nagios running, you must modify all but a few of the sample configuration files. Configuring Nagios to monitor your servers is not as difficult as it looks. To help you, you can use the verbose mode of the Nagios binary by running:

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

This command will go through the configuration files and report any errors. Start fixing the errors one by one, and run the command again to find the next error. For testing purposes, it is easiest to disable all hosts and services definitions in the sample configuration files and merely use the files as templates for your own hosts and services. You can keep most of the files as is, but remove the following, which will be created from scratch:

hosts.cfg

services.cfg

contacts.cfg

contactgroups.cfg

hostgroups.cfg

Start by configuring a host to monitor. We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one for the sake of simplicity.

Here are the contents of hosts.cfg:

# Generic host definition template

define host{

# The name of this host template – referenced i

name generic-host

n other host definitions, used for template recursion/resolution

# Host notifications are enabled

notifications_enabled 1

# Host event handler is enabled

event_handler_enabled 1

# Flap detection is enabled

flap_detection_enabled 1

# Process performance data

process_perf_data 1

# Retain status information across program restarts

retain_status_information 1

# Retain non-status information across program restarts

retain_nonstatus_information 1

# DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST,

# JUST A TEMPLATE!

register 0

}

# Host Definition

define host{

# Name of host template to use

use generic-host

host_name freelinuxcd.org

alias Free Linux CD Project Server

address www.freelinuxcd.org

check_command check-host-alive

max_check_attempts 10

notification_interval 120

notification_period 24×7

notification_options d,u,r

}

The first host defined is not a real host but a template from which other host definitions are derived. This mechanism can be seen in other configuration files and makes configuration based on a predefined set of defaults a breeze.

With this setup we are monitoring only one host, www.freelinuxcd.org, to see if it is alive. The host_name parameter is important because other configuration files will refer to this server by this name. Now the host needs to be added to a hostgroup, so that the application knows which contact group to send notifications to.

Here’s what hostgroups.cfg looks like:

define hostgroup{

hostgroup_name flcd-servers

alias The Free Linux CD Project Servers

contact_groups flcd-admins

members freelinuxcd.org

}

This defines a new hostgroup and associates the flcd-admins contact_group with it. Now you’ll need to define that contact group in contactgroups.cfg:

define contactgroup{

contactgroup_name flcd-admins

alias FreeLinuxCD.org Admins

members oktay, verty

}

Here the flcd-admins contact_group is defined with two members, oktay and verty. This configuration ensures that both users will be notified when something goes wrong with a server that flcd-admins is responsible for. The next step is to set the contact information and notification preferences for these users.

Here are the definitions for those two members in contacts.cfg:

define contact{

contact_name oktay

alias Oktay Altunergil

service_notification_period 24×7

host_notification_period 24×7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-by-email,notify-by-epager

host_notification_commands host-notify-by-email,host-notify-by-epager

email oktay@freelinuxcd.org

pager dummypagenagios-admin@localhost.localdomain

}

define contact{

contact_name Verty

alias David ‘Verty’ Ky

service_notification_period 24×7

host_notification_period 24×7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-by-email,notify-by-epager

host_notification_commands host-notify-by-email

email verty@flcd.org

}

In addition to providing contact details for a particular user, the contact_name in the contacts.cfg file is also used by the CGI scripts (i.e., the web interface) to determine whether a particular user is allowed to access a particular resource. Now that your hosts and contacts are configured, you can start to configure monitoring for individual services on your server.

This is done in services.cfg :

# Generic service definition template

define service{

# The ‘name’ of this service template, referenced in other service definitions

name generic-service

# Active service checks are enabled

active_checks_enabled 1

# Passive service checks are enabled/accepted

passive_checks_enabled 1

# Active service checks should be parallelized

# (disabling this can lead to major performance problems)

parallelize_check 1

# We should obsess over this service (if necessary)

obsess_over_service 1

# Default is to NOT check service ‘freshness’

check_freshness 0

# Service notifications are enabled

notifications_enabled 1

# Service event handler is enabled

event_handler_enabled 1

# Flap detection is enabled

flap_detection_enabled 1

# Process performance data

process_perf_data 1

# Retain status information across program restarts

retain_status_information 1

# Retain non-status information across program restarts

retain_nonstatus_information 1

# DONT REGISTER THIS DEFINITION – ITS NOT A REAL SERVICE, JUST A TEMPLATE!

register 0

}

# Service definition

define service{

# Name of service template to use

use generic-service

host_name freelinuxcd.org

service_description HTTP

is_volatile 0

check_period 24×7

max_check_attempts 3

normal_check_interval 5

retry_check_interval 1

contact_groups flcd-admins

notification_interval 120

notification_period 24×7

notification_options w,u,c,r

check_command check_http

}

# Service definition

define service{

# Name of service template to use

use generic-service

host_name freelinuxcd.org

service_description PING

is_volatile 0

check_period 24×7

max_check_attempts 3

normal_check_interval 5

retry_check_interval 1

contact_groups flcd-admins

notification_interval 120

notification_period 24×7

notification_options c,r

check_command check_ping!100.0,20%!500.0,60%

}

This setup configures monitoring for two services. The first service definition, which has been called HTTP, will monitor whether the web server is up and will notify you if there’s a problem. The second definition monitors the ping statistics from the server and notifies you if the response time or packet loss become too high. The commands used are check_http and check_ping, which were installed into the libexec directory during the plug-in installation. Please take your time to familiarize yourself with all other available plug-ins and configure them similarly to the previous example definitions.

Once you’re happy with your configuration, run Nagios with the -v switch one last time to make sure everything checks out. Then run it as a daemon by using the -d switch:

# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

That’s all there is to it. Give Nagios a couple of minutes to generate some data, and then point your browser to the machine and look at the pretty service warning lights.

Use Nagios to keep tabs on your network.

Since remote exploits can often crash the service that is being broken into or cause its CPU use to skyrocket, you should monitor the services that are running on your network. Just looking for an open port (such as by using Nmap [Hack #42]) isn’t enough. The machine may be able to respond to a TCP connect request, but the service may be unable to respond (or worse, could be replaced by a different program entirely!). One tool that can help you verify your services at a glance is Nagios (http://www.nagios.org).

Nagios is a network-monitoring application that monitors not only the services running on the hosts on your network, but also the resources on each host, such as CPU usage, disk space, memory usage, running processes, log files, and much more. In the advent of a problem it can notify you through email, pager, or any other method that you define, and you can check the status of your network at a glace by using the web GUI. Nagios is also easily extensible through its plug-in API.

To install Nagios, download the source distribution from the Nagios web site. Then, unpack the source distribution and go into the directory it creates:

$ tar xfz nagios-1.1.tar.gz

$ cd nagios-1.1

Before running Nagios’s configure script, you should create a user and group for Nagios to run as (e.g., nagios). Then run the configure script with a command similar to this:

$ ./configure –with-nagios-user=nagios –with-nagios-grp=nagios

This will install Nagios in /usr/local/nagios. As usual, you can modify this behavior by using the –prefix switch. After the configure script finishes, compile Nagios by running make all. Then become root and run make install to install it. In addition, you can optionally install Nagios’s initialization scripts by running make install-init.

If you take a look into the /usr/local/nagios directory right now, you will see that there are four directories. The bin directory contains a single file, nagios, that is the core of the package. This application does the actual monitoring. The sbin directory contains the CGI scripts that will be used in the web-based interface. Inside the share directory, you’ll find the HTML files and documentation. Finally, the var directory is where Nagios will store its information once it starts running.

Before you can use Nagios, you will need a couple of configuration files. These files go into the etc directory, which will be created when you run make install-config. This command also creates a sample copy of each required configuration file and puts them into the etc directory.

At this point the Nagios installation is complete. However, it is not very useful in its current state, because it lacks the actual monitoring applications. These applications, which check whether a particular monitored service is functioning properly, are called plug-ins. Nagios comes with a default set of plug-ins, but they must be downloaded and installed separately.

Download the latest Nagios Plugins package and decompress it. You will need to run the provided configure script to prepare the package for compilation on your system. You will find that the plug-ins are installed in a fashion similar to the actual Nagios program.

To compile the plug-ins, run commands similar to these:

$ ./configure –prefix=/usr/local/nagios \

–with-nagios-user=nagios –with-nagis-grp=nagios

$ make all

You might get notifications about missing programs or Perl modules while the script is running. These are mostly fine, unless you specifically need the mentioned applications to monitor a service.

After compilation is finished, become root and run make install to install the plug-ins. The plug-ins will be installed in the libexec directory of your Nagios base directory (e.g., /usr/local/nagios/libexec).

There are a few rules that all Nagios plug-ins should implement, making them suitable for use by Nagios. All plug-ins provide a –help option that displays information about the plug-in and how it works. This feature is very helpful when you’re trying to monitor a new service using a plug-in you haven’t used before.

For instance, to learn how the check_ssh plug-in works, run the following command:

$ /usr/local/nagios/libexec/check_ssh

check_ssh (nagios-plugins 1.4.0alpha1) 1.13

The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute

copies of the plugins under the terms of the GNU General Public License.

For more information about these matters, see the file named COPYING.

Copyright (c) 1999 Remi Paulmier <remi@sinfomic.fr>

Copyright (c) 2000-2003 Nagios Plugin Development Team

<nagiosplug-devel@lists.sourceforge.net>

Try to connect to SSH server at specified server and port

Usage: check_ssh [-46] [-t <timeout>] [-p <port>] <host>

check_ssh (-h | –help) for detailed help

check_ssh (-V | –version) for version information

Options:

-h, –help

Print detailed help screen

-V, –version

Print version information

-H, –hostname=ADDRESS

Host name or IP Address

-p, –port=INTEGER

Port number (default: 22)

-4, –use-ipv4

Use IPv4 connection

-6, –use-ipv6

Use IPv6 connection

-t, –timeout=INTEGER

Seconds before connection times out (default: 10)

-v, –verbose

Show details for command-line debugging (Nagios may truncate output)

Send email to nagios-users@lists.sourceforge.net if you have questions

regarding use of this software. To submit patches or suggest improvements,

send email to nagiosplug-devel@lists.sourceforge.net

Now that both Nagios and the plug-ins are installed, we are almost ready to begin monitoring our servers. However, Nagios will not even start before it’s configured properly.

The sample configuration files provide a good starting point:

$ cd /usr/local/nagios/etc
$ ls -1

cgi.cfg-sample

checkcommands.cfg-sample

contactgroups.cfg-sample

contacts.cfg-sample

dependencies.cfg-sample

escalations.cfg-sample

hostgroups.cfg-sample

hosts.cfg-sample

misccommands.cfg-sample

nagios.cfg-sample

resource.cfg-sample

services.cfg-sample

timeperiods.cfg-sample

Since these are sample files, the Nagios authors added a .cfg-sample suffix to each file. First, we need to copy or rename each one to end in .cfg, so that the software can use them properly. (If you don’t change the configuration filenames, Nagios will not be able to find them.)

You can either rename each file manually or use the following command to take care of them all at once. Type the following script on a single line:

# for i in *cfg-sample; do mv $i `echo $i | \

sed -e s/cfg-sample/cfg/`; done;

First there is the main configuration file, nagios.cfg. You can pretty much leave everything as is—the Nagios installation process will make sure the file paths used in the configuration file are correct. There’s one option, however, that you might want to change: check_external_commands, which is set to 0 by default. If you would like to be able to directly run commands through the web interface, you will want to set this to 1. Depending on your network environment, this may or may not be an acceptable security risk, as enabling this option will permit the execution of scripts from the web interface. Other options you need to set in cgi.cfg configure which usernames are allowed to run external commands.

To get Nagios running, you must modify all but a few of the sample configuration files. Configuring Nagios to monitor your servers is not as difficult as it looks. To help you, you can use the verbose mode of the Nagios binary by running:

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

This command will go through the configuration files and report any errors. Start fixing the errors one by one, and run the command again to find the next error. For testing purposes, it is easiest to disable all hosts and services definitions in the sample configuration files and merely use the files as templates for your own hosts and services. You can keep most of the files as is, but remove the following, which will be created from scratch:

hosts.cfg

services.cfg

contacts.cfg

contactgroups.cfg

hostgroups.cfg

Start by configuring a host to monitor. We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one for the sake of simplicity.

Here are the contents of hosts.cfg:

# Generic host definition template

define host{

# The name of this host template – referenced i

name generic-host

n other host definitions, used for template recursion/resolution

# Host notifications are enabled

notifications_enabled 1

# Host event handler is enabled

event_handler_enabled 1

# Flap detection is enabled

flap_detection_enabled 1

# Process performance data

process_perf_data 1

# Retain status information across program restarts

retain_status_information 1

# Retain non-status information across program restarts

retain_nonstatus_information 1

# DONT REGISTER THIS DEFINITION – ITS NOT A REAL HOST,

# JUST A TEMPLATE!

register 0

}

# Host Definition

define host{

# Name of host template to use

use generic-host

host_name freelinuxcd.org

alias Free Linux CD Project Server

address www.freelinuxcd.org

check_command check-host-alive

max_check_attempts 10

notification_interval 120

notification_period 24×7

notification_options d,u,r

}

The first host defined is not a real host but a template from which other host definitions are derived. This mechanism can be seen in other configuration files and makes configuration based on a predefined set of defaults a breeze.

With this setup we are monitoring only one host, www.freelinuxcd.org, to see if it is alive. The host_name parameter is important because other configuration files will refer to this server by this name. Now the host needs to be added to a hostgroup, so that the application knows which contact group to send notifications to.

Here’s what hostgroups.cfg looks like:

define hostgroup{

hostgroup_name flcd-servers

alias The Free Linux CD Project Servers

contact_groups flcd-admins

members freelinuxcd.org

}

This defines a new hostgroup and associates the flcd-admins contact_group with it. Now you’ll need to define that contact group in contactgroups.cfg:

define contactgroup{

contactgroup_name flcd-admins

alias FreeLinuxCD.org Admins

members oktay, verty

}

Here the flcd-admins contact_group is defined with two members, oktay and verty. This configuration ensures that both users will be notified when something goes wrong with a server that flcd-admins is responsible for. The next step is to set the contact information and notification preferences for these users.

Here are the definitions for those two members in contacts.cfg:

define contact{

contact_name oktay

alias Oktay Altunergil

service_notification_period 24×7

host_notification_period 24×7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-by-email,notify-by-epager

host_notification_commands host-notify-by-email,host-notify-by-epager

email oktay@freelinuxcd.org

pager dummypagenagios-admin@localhost.localdomain

}

define contact{

contact_name Verty

alias David ‘Verty’ Ky

service_notification_period 24×7

host_notification_period 24×7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-by-email,notify-by-epager

host_notification_commands host-notify-by-email

email verty@flcd.org

}

In addition to providing contact details for a particular user, the contact_name in the contacts.cfg file is also used by the CGI scripts (i.e., the web interface) to determine whether a particular user is allowed to access a particular resource. Now that your hosts and contacts are configured, you can start to configure monitoring for individual services on your server.

This is done in services.cfg :

# Generic service definition template

define service{

# The ‘name’ of this service template, referenced in other service definitions

name generic-service

# Active service checks are enabled

active_checks_enabled 1

# Passive service checks are enabled/accepted

passive_checks_enabled 1

# Active service checks should be parallelized

# (disabling this can lead to major performance problems)

parallelize_check 1

# We should obsess over this service (if necessary)

obsess_over_service 1

# Default is to NOT check service ‘freshness’

check_freshness 0

# Service notifications are enabled

notifications_enabled 1

# Service event handler is enabled

event_handler_enabled 1

# Flap detection is enabled

flap_detection_enabled 1

# Process performance data

process_perf_data 1

# Retain status information across program restarts

retain_status_information 1

# Retain non-status information across program restarts

retain_nonstatus_information 1

# DONT REGISTER THIS DEFINITION – ITS NOT A REAL SERVICE, JUST A TEMPLATE!

register 0

}

# Service definition

define service{

# Name of service template to use

use generic-service

host_name freelinuxcd.org

service_description HTTP

is_volatile 0

check_period 24×7

max_check_attempts 3

normal_check_interval 5

retry_check_interval 1

contact_groups flcd-admins

notification_interval 120

notification_period 24×7

notification_options w,u,c,r

check_command check_http

}

# Service definition

define service{

# Name of service template to use

use generic-service

host_name freelinuxcd.org

service_description PING

is_volatile 0

check_period 24×7

max_check_attempts 3

normal_check_interval 5

retry_check_interval 1

contact_groups flcd-admins

notification_interval 120

notification_period 24×7

notification_options c,r

check_command check_ping!100.0,20%!500.0,60%

}

This setup configures monitoring for two services. The first service definition, which has been called HTTP, will monitor whether the web server is up and will notify you if there’s a problem. The second definition monitors the ping statistics from the server and notifies you if the response time or packet loss become too high. The commands used are check_http and check_ping, which were installed into the libexec directory during the plug-in installation. Please take your time to familiarize yourself with all other available plug-ins and configure them similarly to the previous example definitions.

Once you’re happy with your configuration, run Nagios with the -v switch one last time to make sure everything checks out. Then run it as a daemon by using the -d switch:

# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

That’s all there is to it. Give Nagios a couple of minutes to generate some data, and then point your browser to the machine and look at the pretty service warning lights.

Dette indlæg blev udgivet i Knowledge Base, Networking, Old Base. Bogmærk permalinket.

Skriv et svar