nagios配置
1:配置web接口
假设你已经运行了apache,如果没有,请参考:
http://localhost/upload/blog.php?do-showone-tid-18.html
vi /usr/local/apache2/conf/httpd.conf
添加如下内容:
- ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
- <Directory "/usr/local/nagios/sbin">
- Options ExecCGI
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd.users
- Require valid-user
- </Directory>
- Alias /nagios /usr/local/nagios/share
- <Directory "/usr/local/nagios/share">
- Options None
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd.users
- Require valid-user
- </Directory>
修改完毕,保存文件,并重启apache:
/usr/local/apahce2/bin/apachectl restart
2:配置apache的BASIC认证:
生成认证密码:
/usr/local/apache2/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.users nagios nagios
apache接口配置完成。
开始配置nagios:
cd /usr/local/nagios/etc/
在/usr/local/nagios/etc下是nagios的配置模板文件-sample,把.cfg-sample文件全部拷贝成.cfg
例如:cp nagios.cfg-sample nagios.cfg
全部拷贝完成即可.
vi minimal.cfg
注释所有command:
注释的方法是在每一个定义语句前面添加”#“
修改cgi.cfg
修改use_authentication=1为use_authentication=0,即不用验证.不然有一些页面不会显示。
现在检查配置文件是否有语法错误:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果正确,会显示以下结果:
Total Warnings: 0
Total Errors: 0
否则,需要根据提示进行修改配置文件。配置文件等会再弄。现在启动nagios
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
为了使nagios异常中断,我们使用daemontools启动:
安装daemontool:
- mkdir -p /package
- chmod 1755 /package
- cd /package
- fetch http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
- cd admin/daemontools-0.76/
- package/install
检查svscan进程是否启动:
- ps aux | grep svscan
- root 376 0.0 0.0 1636 0 con- IW - 0:00.00 /bin/sh /command/svscanboot
- root 411 0.0 0.0 1224 208 con- S 8Jul06 0:42.50 svscan /service
ok,启动正常了。
- cd /service
- mkdir nagios
- chmod 1755 nagios
- touch ./run
- chmod 755 ./run
- vi run
- PATH=/usr/local/bin:/usr/bin:/bin
- export PATH
- exec env - PATH=$PATH \
- /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
- mkdir log
- cd log
- touch ./run
- chmod 755 ./run
- vi ./run
- #!/bin/sh
- exec setuidgid logadmin multilog t s1000000 n100 ./main
- mkdir main
- chmod 777 main
- chown nagios.nagios main
- touch status
- chown nagios.nagios status
- svc -u /service/nagios/
- svstat /service/nagios/
- root@## ps auxww | grep nagios
- root 23276 0.0 0.1 1176 488 ?? I 5:00PM 0:01.71 supervise nagios
- nagios 34251 0.0 0.3 2316 1552 ?? S 6:06PM 0:00.10 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
- root@##
ok,现在把nagios服务做成自动启动的服务了。通过svc命令可以启动或者停止服务。
- ---------------------------------------------------------------------------------
- svc opts services
- opts is a series of getopt-style options. services consists of any number of arguments, each argument naming a directory used by supervise.
- -u: Up. If the service is not running, start it. If the service stops, restart it.
- -d: Down. If the service is running, send it a TERM signal and then a CONT signal. After it stops, do not restart it.
- -o: Once. If the service is not running, start it. Do not restart it if it stops.
- -p: Pause. Send the service a STOP signal.
- -c: Continue. Send the service a CONT signal.
- -h: Hangup. Send the service a HUP signal.
- -a: Alarm. Send the service an ALRM signal.
- -i: Interrupt. Send the service an INT signal.
- -t: Terminate. Send the service a TERM signal.
- -k: Kill. Send the service a KILL signal.
- -x: Exit. supervise will exit as soon as the service is down. If you use this option on a stable system, you're doing something wrong; supervise is designed to run forever.
- ---------------------------------------------------------------------------------
比如:
停止nagios--svc -d /service/nagios/
重启nagios--svc -t /service/nagios/
启动nagios--svc -u /service/nagios/
当然,你也可以使用inited的方式进行:
/usr/local/etc/rc.d/nagios start/stop
好了,反正daemontools很强大,现在打开网页:http://localhost/nagios/,一定会让你大吃一惊,呵呵,我的服务器和服务状态都清楚的看到了。现在我们的nagios中只有一个,那就是它自己,localhost,呵呵,等会我们添加别的主机和主机服务。
#p#
nagios的庐山真面目
1)为主机添加一个服务
为localhost主机添加qmail服务的监控,方法如下:
- vi minimal.cfg
- define service{
- use generic-service ; Name of service template to use
- host_name localhost
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!20%!10%!/
- }
可以直接拷贝原有的进行修改,我这个就是拷贝的原有的check_local_disk进行的。修改host_name,service_description,check_command等
- define service{
- use generic-service ; Name of service template to use
- host_name localhost
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!20%!10%!/
- }
照猫画虎的进行修改,然后去修改:
- vi checkcommands.cfg
- #'check_qmail' command definition
- define command{
- command_name check_qmail
- command_line $USER1$/check_smtp -H 127.0.0.1
- }
- define command{
- command_name check_pop3
- command_line $USER1$/check_pop -H 127.0.0.1
- }
保存,然后检查配置文件:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果没有错误会显示:
Total Warnings: 0
Total Errors: 0
如果有错误,请根据提示进行错误的修正。
重启nagios
svc -d /service/nagios/ && svc -u /service/nagios/
通过web页面检查nagios的结果:
http://10.5.1.153/nagios/
点击“Service Detail”
#p#
2)添加主机并添加服务
我们会监控这台主机的负载、磁盘等一些没有通过端口方式启动的服务器状态,以及它的服务,比如:apache、mysql、qmail和ntp等等吧。那么没有端口的nagios直接能监控到吗?答案是不行。所以我们必须在两台主机上安装nrpe,nrpe可以启动5666端口,把检测的信息源源不断的传给监控中心的主机。
ok,我们把apache、mysql、qmail和ntp先加上,这回我们把监控的主机和服务新建一个文件:
- cd /usr/local/nagios/etc/
- touch 10_5_1_156.cfg
- vi nagios.cfg
- cfg_file=/usr/local/nagios/etc/10_5_1_156.cfg
- vi 10_5_1_156.cfg
定义一个主机:
- define host{
- use generic-host ; Name of host template to use
- host_name test_nrpe
- alias client
- address 10.5.1.156
- check_command check-host-alive
- max_check_attempts 1
- check_period 24x7
- notification_interval 120
- notification_period 24x7
- notification_options d,r
- contact_groups admins
- }
定义主机需要检查的服务:
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description PING
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ping!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description apache
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_http!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description mysql
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_mysql!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description ntp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ntp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!100.0,20%!500.0,60%
- }
nagios配置中,服务就定义完了。此时是不是多了一个主机和它下面的服务呢?那是肯定的。如果这个过程中出现添加主机和服务可能出现的问题该怎么解决?请阅读:概念篇、安装篇和故障解决篇