Nagios check for Varnish Backends
We recently starting using Varnish to cache un-authenticated requests to our web farm. It even has this great feature called grace mode, where it will keep serving a cached version of a page if the back-end server goes down. But we still want to be alerted that the back-end is down.
I wrote a Nagios check to do just that. It's written in Python, and using the varnishadm command-line tool that ships with Varnish.
#!/usr/bin/python
# save as /usr/lib/nagios/plugins/check_varnish_backends.py
from optparse import OptionParser
import subprocess
def getOptions():
arguments = OptionParser()
arguments.add_option("--host", dest="host", help="Host varnishadm is running on", type="string", default="localhost")
arguments.add_option("--port", dest="port", help="varnishadm port", type="string", default="6082")
arguments.add_option("--secret", dest="secret", help="varnishadm secret file", type="string", default="/etc/varnish/secret")
arguments.add_option("--command", dest="command", help="varnishadm backend health command", type="string", default="debug.health")
return arguments.parse_args()[0]
def run(command, exit_on_fail=True):
# don't use check_output in order to supportPython 2.6
process = subprocess.Popen(command.split(" "), stdout=subprocess.PIPE)
output, unused_err = process.communicate()
_retcode = process.poll()
return output
if __name__ == '__main__':
options = getOptions()
varnishadm_raw = run("varnishadm -T %(host)s:%(port)s -S %(secret)s %(command)s" % options.__dict__)
lines = varnishadm_raw.split("\n")
backends_sick, backends_healthy = [], []
for line in lines:
if line.startswith("Backend"):
if line.endswith("Sick"):
backends_sick.append(line)
else:
backends_healthy.append(line)
if not backends_healthy and not backends_sick:
print "There are NO backends"
exit(2)
if backends_sick:
print "".join(backends_sick)
exit(2)
print "All %s backends are healthy" % len(backends_healthy)
Because varnishadm uses a shared secret file, I decided to have the checks run on the actual Varnish hosts, using Nagios NRPE.
# add the following config line on the varnish hosts
vim /etc/nagios/nrpe.cfg
command[check_varnish_backends]=/usr/lib/nagios/plugins/check_varnish_backends.py
service nagios-nrpe-server restart
# and configure the Nagios server to look for that check
vim /etc/nagios3/conf.d/services_nagios2.cfg
define service {
use generic-service
hostgroup_name proxy-servers
service_description check_varnish_backends
check_command check_nrpe_1arg!check_varnish_backends
}
service nagios3 restart
Note: depending on your setup, you may need to use chmod to give the nagios user access to ready the shared secret file (/etc/varnish/secret) on the Varnish servers.