ftw.monitor automatically starts a zc.monitor server on instance boot.
This monitor server supports a health_check command that can be used as
a TCP health check in HAProxy or service monitoring framworks.
ftw.monitor is an alternative to collective.monitor
or five.z2monitor in that it
completly relies on autoconfiguration. No product-config or ZCML is needed,
the monitor port will always be picked automatically based on the instance's base port:
monitor_port = instance_port + 80
In addition, ftw.monitor also provides a perf_metrics command that
allows to interrogate an instance for performance related metrics.
Table of Contents
Plone 4.3.x
- Add the package to your buildout configuration:
[instance]
eggs +=
...
ftw.monitorOnce ftw.monitor is included in your instance(s) eggs, it will
automatically start a monitor server upon instance boot:
INFO ZServer HTTP server started at Mon May 6 14:53:08 2019
Hostname: 0.0.0.0
Port: 8080
...
INFO zc.ngi.async.server listening on ('', 8160)
The monitor server port is derived from the instance's port:
monitor_port = instance_port + 80
The monitor server can be inspected and tested using netcat:
$ echo 'help' | nc -i 1 localhost 8160
Supported commands:
dbinfo -- Get database statistics
health_check -- Check whether the instance is alive and ready to serve requests.
help -- Get help about server commands
interactive -- Turn on monitor's interactive mode
monitor -- Get general process info
perf_metrics -- Get performance related metrics
quit -- Quit the monitor
zeocache -- Get ZEO client cache statistics
zeostatus -- Get ZEO client status informationAlternatively, a bin/instance monitor <cmd> script is provided that
essentially does the same thing (sending the given command to the respective
monitor port and displaying the response):
$ bin/instance monitor helpThe health_check command provided by ftw.monitor allows to check
whether a Zope instance is alive and ready to serve requests.
If so, it will respond with OK\n:
$ echo 'health_check' | nc -i 1 localhost 8160
OKWhile a warmup is in progress (see below), the health_check will
respond with an according message.
Because health checks and instance warmup are tricky to deal with separately,
ftw.monitor also provides a mechanism for warming up Plone sites.
A @@warmup view is provided on both the Plone site root as well as
Zope application root levels which will warm up either that specific
Plone site, or all Plone sites in that Zope instance.
The warmup view will look for an IWarmupPerformer multiadapter that adapts
a Plone site and request, and will execute the necessary actions to warm up
that Plone site.
There is a default IWarmupPerformer implementation in ftw.monitor
which will load catalog BTrees and forward index BTrees of the most used
catalog indexes (allowedRolesAndUsers and object_provides).
While the warmup is in progress, the health_check command will not yet
indicate the instance as being healthy:
$ echo 'health_check' | nc -i 1 localhost 8160
Warmup in progressBy default, ftw.monitor will automatically warm up a booting instance, by
sending a request to the @@warmup view. The instance will be considered
healthy (by the health_check command) once the warmup has been performed
successfully.
If this behavior is not desired, automatic warmup can be disabled by setting
the FTW_MONITOR_AUTOWARMUP environment variable to 0 before starting
the instance(s):
export FTW_MONITOR_AUTOWARMUP=0The perf_metrics command can be used to query an instance for various
metrics that are related to performance.
Syntax: perf_metrics [dbname] [sampling-interval]
You can pass a database name, where "-" is an alias for the main database,
which is the default. The sampling interval (specified in seconds)
defaults to 5m, and affects DB statistics retrieved from the ZODB
ActivityMonitor, specifically loads, stores and connections.
The maximum history length (and therefore sampling interval) configured in the ActivityMonitor is 3600s in a stock installation.
The command will return the metrics as a JSON encoded string (whitespace added for clarity).
{
"instance": {
"uptime": 39
},
"cache": {
"size": 3212,
"ngsize": 1438,
"max_size": 30000
},
"db": {
"loads": 1114,
"stores": 28,
"connections": 459,
"conflicts": 7,
"unresolved_conflicts": 3,
"total_objs": 13336,
"size_in_bytes": 5796849
},
"memory": {
"rss": 312422400,
"uss": 298905600,
"pss": 310822823
}
}instance
uptime- Time since instance start (in seconds)
cache
size- Number of objects in cachengsize- Number of non-ghost objects in cachemax_size- Cache size (in number of objects)
db
loads- Number of object loads in sampling intervalstores- Number of object stores in sampling intervalconnections- Number of connections in sampling intervalconflicts- Total number of conflicts since instance startunresolved_conflicts- Total number of unresolved conflicts since instance starttotal_objs- Total number of objects in the storagesize_in_bytes- Size of the storage in bytes (so FileStorage'sData.fs, usually. Excludes BlobStorage)
Note
- loads, stores and connections are cumulative across all connections in the pool of that instance.
- total_objs and size_in_bytes may or may not be reported correctly when using
RelStorage, depending on the SQL adapter
memory
rss- RSS (Resident Set Size) in bytesuss- USS (Unique Set Size) in bytespss- PSS (Proportional Set Size) in bytes (Linux only,-1on other platforms)
For easy ingestion into InfluxDB via Telegraf, performance metrics for all reachable instances can be dumped using the bin/dump-perf-metrics script. This script will collect metrics from all instances, and dump them in InfluxDB Line Protocol format.
The following is an example of how to use the health_check command as
a HAProxy TCP health check:
backend plone03
# ...
option tcp-check
tcp-check connect
tcp-check send health_check\r\n
tcp-check expect string OK
server plone0301 127.0.0.1:10301 cookie p01 check port 10381 inter 10s downinter 15s maxconn 5 rise 1 slowstart 60s
server plone0302 127.0.0.1:10302 cookie p02 check port 10382 inter 10s downinter 15s maxconn 5 rise 1 slowstart 60s
server maintenance 127.0.0.1:10319 backupNote in particular that option tcp-check changes all health checks for
this backend to TCP mode. So the maintenance server in this example,
which is an HTTP server, needs to have health checks turned off.
In order to switch to ftw.monitor for health monitoring, the following
steps are necessary:
Configure your zope instance to only use one ZServer thread.
ftw.monitoris intended for use in setups with one thread per instance. Example using buildout andplone.recipe.zope2instance:[instance0] zserver-threads = 1
Remove any
HttpOkplugins from your supervisor configuration. With only one thread per instance, that approach to service monitoring can't work any more, and must be disabled.If you're extending from
production.cfgand/orzeoclients/<n>.cfgfromftw-buildouts, you can get rid of theHttpOksupervisor plugins like this (after extending from one of these configs):[supervisor] eventlisteners-httpok =
Remove
collective.warmup(if present). Sinceftw.monitorincludes its own auto-warmup logic, the use ofcollective.warmupis unnecessary (or even detrimental).If you're extending from
warmup.cfgfromftw-buildouts, you can neutralizecollective.warmupwith a section like this (after extending fromwarmup.cfg):[buildout] warmup-parts = warmup-eggs = warmup-instance-env-vars =
Change your HAProxy health checks to TCP checks instead of HTTP. See the section above for an example of an appropriate HAProxy configuration.
- Fork this repo
- Clone your fork
- Shell:
ln -s development.cfg buildout.cfg - Shell:
python bootstrap.py - Shell:
bin/buildout
Run bin/test to test your changes.
Or start an instance by running bin/instance fg.
- Github: https://github.com/4teamwork/ftw.monitor
- Issues: https://github.com/4teamwork/ftw.monitor/issues
- Pypi: http://pypi.python.org/pypi/ftw.monitor
This package is copyright by 4teamwork.
ftw.monitor is licensed under GNU General Public License, version 2.