Sunday, July 23, 2006

Mon It

Recently I tried an application that aids in ensuring your system is running smoothly called monit. From the website of monit "monit is a utility for managing and monitoring, processes, files, directories and devices on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations." There are packages available for FreeBSD and Ubuntu.

Installing and setting up monit on ubuntu is fairly easy, just do "sudo apt-get install monit" and its installed. Configuring monit is extremely easy. In fact the "/etc/monit/monitrc" file explains in depth the options available and how to use the variables.

The installation step for FreeBSD is "pkg_add -rvv monit". The monitrc file is in "/usr/local/etc/monitrc".

I'm posting a sample of my monitrc here as a simple guide.
#start sample monitrc
set daemon 120 #monitor the system every 2 minutes
set mailserver localhost #use localhost as the mailserver
set mail-format {from: monit@system} #set the from field of the mail
set alert user@localhost #who will receive the mail
#set the http port in which monit will report its status, the address it listens on, who can connect to it as well as the username and password required to login to monit
set httpd port 10123 and
use address localhost
allow localhost
allow admin:monit
include /etc/monit/services/* #Include files containing services, filesystem to monitor
#end sample monitrc

sample /etc/services/ssh #sample file that monitors the ssh service
#start /etc/services/ssh
check process SSH with pidfile /var/run/
start program = "/etc/init.d/ssh start"
stop program = "/etc/init.d/ssh stop"
if cpu is greater than 60% for 2 cycles then alert
if cpu > 80% for 5 cycles then restart
if totalmem > 200.0 MB for 5 cycles then restart
if children > 250 then restart
if loadavg(5min) greater than 10 for 8 cycles then stop
if 3 restarts within 5 cycles then timeout
group Server
#end /etc/services/ssh

As you can notice, the file is pretty self explanatory. Some other things that you can do are monitor if other machines are alive, alert you if the filesystem usage exceeds a certain percentage, ensure that filesystem permissions are set correctly, ensure that the checksum for specific system files are not changed, etc. Feel free to try out the various combination and options available.

Happy MonIt-oring;)

No comments: