Check_MK is an extension to the Nagios monitoring system that allows creating rule-based configuration using Python and offloading work from the Nagios core to make it scale better, allowing more systems to be monitored from a single Nagios server.
It comes with a set of system checks, a mod_python/js based web UI and a module that allows fast access to the Nagios core. On top of Nagios it also adds additional features.
The first public versions were available in 2008. In April 2009 it was released under the GPL. Since 2009, the releases have been tracked in git.
The current stable version is 1.2.0p1.
"Stable" releases are labeled with a major version and a "p" for production and the build number: I.e. 1.1.12p6 is a stable 1.12 version, and it is the 6th public release. These are ABI compatible within their version, so a 1.12p5 config will work mostly unchanged for 1.12p6;
"Innovation" releases are specially marked versions based on the development branch that are for public testing. Check_MK keeps the interfaces stable during the lifetime of a "p" release, but they may change between new stable releases. For example, there are changes between 1.1.10p and 1.12p.x where users will have to adjust their configuration.
- Autodetection of configuration of data points in a monitored system (inventory)
- Rule-based configuration
- Agentless (snmp-based) monitoring
- Scalability tuning for setups that could normally not be monitored using Nagios.
- Replacement of standard Nagios GUI and centralized monitoring
- Nagios configuration management
Check_MK includes a combination of multiple components:
- Using multiple "passive" checks via a single "active" check (passive checks are only processed, but not executed by Nagios, which is considerably faster)
- Modules to unify configuration handling and connections to monitored systems. This makes TCP or SNMP access transparent to the user and authors of check plugins
- Configuration handling for PNP4Nagios
- An agent for host operating systems. The relatively small agent only runs the commands to gather the data needed to run checks but avoids local processing. Per design it is also not allowed to accept any external input. There are agents for different operating systems such as Linux, Unix and Windows.
- Checks that consist of agent- and server-side parts. Check_MK gives them a framework for handling connections, talking to Nagios and handling internal errors. There are rather strict design standards for writing checks that are supposed to bring more conformity to the plugins than with standard Nagios plugins. The checks handle the detection of supported devices and are then automatically called to check against the expected status (good) of a component that was found earlier on. Currently there are about 230 plugins in the official distribution plus 40 on the community exchange.
- Livestatus is a module that handles direct access to the core of Nagios to allow. It can be queried using a query language and is used as a backend. Various Nagios addons use livestatus to access Nagios data:
JasperReports, NagiosBP, Thruk, NagstaMon, NagVis and Multisite.
- Multisite is a GUI component that can run in parallel or instead of the standard Nagios GUI. It uses Livestatus to access one or more Nagios servers directly and can build reports from the available data.
There also are plugins for Multisite:
- Check_MK BI - a business process / impact analysis tool
- WATO - a web administration frontend to the check_mk (and nagios) configuration (rule-based)
It is possible to use only some of the components: People can use Check_MK to define a configuration that only consists of standard Nagios checks. Another option is to add livestatus to an existing Nagios server without any further modifications. That way a user can use the newer Web interfaces like Multisite or Thruk. There are also posts on the Check_MK mailing lists that indicate it is successfully used with Icinga.
Differences from standard Nagios installations
- Higher total number of services checks as one service is generated per monitored component - a server can have over 1000 services which are all monitored (and can be grouped)
- Usage of RRD databases for historical data with almost every service
- Standard check interval of 1 minute (Nagios defaults to 5 minutes)
- In SNMP monitoring, avoidance of traps in favour of status polling (for extra performance data)
- Smaller, fully scriptable configuration
- focus on passive services solves Nagios check latency problems.