Monitoring OpenSolaris Zones with Nagios

We’re running separate zones for web, app, and db servers. To be able to know the health of our application and our servers, we rely on pnp4nagios for graphing performance data like CPU utilization, memory usage, etc. Using OpenSolaris zones, there is only one OS kernel running. This is different in e.g. XEN, where every VM runs it’s own kernel. Such a “one kernel setup” has some important implications for monitoring: Within a zone, you see CPU utiliziation and memory usage of the whole box (the kernel) instead of what is used by the zone. None of the available nagios check scripts is able to report that data by zone.

Nagios Plugin for monitoring OpenSolaris Zones CPU und MEM

To get CPU and memory data by zone, there is prstat -Z. Executed from within the global zone it returns a list of all zones, their current memory and CPU utilization, etc.

ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE                        
     3       34 2443M 2430M    15%   9:03:05 7.9% app                       
     4       19 2562M 2014M    12% 288:22:24 1.7% db                         
     0       51   85M   92M   0.6% 145:57:00 1.6% global                      
     2      143  483M  192M   1.2%   0:17:52 0.6% web                        
     1       19 3106M 3111M    19%   0:49:05 0.0% mem

Putting it into a script and passing an command line parameter for the zone name to it, the script can calculate a nagios status. The script returns that nagios status based on warning and critical thresholds, passed to the script together with nagios performance data.

prstat -Z needs to run within the global zone

Usually, our nagios server calls nagios check scripts, which need to run locally on the monitored box, via SSH. All our command definitions use $HOSTADDRESS$ as the target host for the SSH connection. $HOSTADDRESS$ resovles to the host under test during a nagios check run. But prstat -Z needs to run within the global zone. To deal with that I added the IP-address of the global zone as a parameter to the nagios check call:

check_command check_by_ssh_zone_mem!4000!5000!app2!5120!10.0.0.1

In my command definition I changed the SSH target from $HOSTADDRESS$ to $ARG5$ making the check_by_ssh script connect to the global zone instead of the host under test.

Now we get nice CPU and MEM usage graphs for our OpenSolaris zones. As soon as I’ll have added documentation to my nagios check scripts, I’ll publish them on MonitoringExchange for everyone to scrunitize and maybe even use.

Discover more from Agile Web Operations

Subscribe to get the latest posts sent to your email.

3 thoughts on “Monitoring OpenSolaris Zones with Nagios”

Susan Wilson says:

August 31, 2009 at 9:51 pm

Has this check been published yet? I like the concept and haven’t seen a really good zone resource checker.
Thanks.
– Susan

LikeLike

Matthias Marschall says:

September 4, 2009 at 11:54 am

I’ve released my zone memory and cpu check scripts for nagios at: http://www.monitoringexchange.org/cgi-bin/page.cgi?g=Detailed%2F3193.html;d=1

Hope it helps!

LikeLike

Mike says:

August 31, 2016 at 8:37 pm

This is not the ideal way to do this if you have a large number of zones that have a large amount of shared memory on them

Running this will cause a lot of locks on the shared memory and stall a lot of processes in the wait state because of the high number of prstat commands running

LikeLike

Monitoring OpenSolaris Zones with Nagios

Nagios Plugin for monitoring OpenSolaris Zones CPU und MEM

prstat -Z needs to run within the global zone

Discover more from Agile Web Operations

Published by Matthias Marschall

3 thoughts on “Monitoring OpenSolaris Zones with Nagios”

Leave a comment Cancel reply

Nagios Plugin for monitoring OpenSolaris Zones CPU und MEM

prstat -Z needs to run within the global zone

Discover more from Agile Web Operations

Share this:

Related

Published by Matthias Marschall

3 thoughts on “Monitoring OpenSolaris Zones with Nagios”

Leave a comment Cancel reply