Process supervisor for stateful container pods, designed for PostgreSQL, ClickHouse, OpenSearch, and other rich data systems where one container hosts a small fleet of cooperating processes. systemd-compatible .service/.timer configuration format; single static Rust binary.
Running multiple processes in one container is an anti-pattern. Docker's own best-practices guide says it directly: "Each container should have only one concern." For stateless services that rule holds, and runr is not for you. Use a minimal base image, set ENTRYPOINT, ship.
There is one settled exception: stateful systems that are multi-process by design (PostgreSQL, ClickHouse, OpenSearch, MongoDB), together with the helper agents that turn them into a service: metrics exporter, connection pooler, backup agent, scheduled vacuums, log collector. runr was built at Ozon for that exact case (database pods on Kubernetes) and open-sourced for anyone with the same problem. It is not an Ozon-wide platform recommendation; it's a focused tool for the case described below.
Splitting that fleet into sidecars has known costs: per-container cgroup limits cause OOM kills with data loss (Crunchy Data on the Linux Assassin), and sidecars receive SIGKILL before flushing during graceful pod termination (Kubernetes sidecar containers docs). The Kubernetes project's own blog (April 2025) is direct about it: "While the sidecar pattern can be useful in many cases, it is generally not the preferred approach unless the use case justifies it."
In practice, production systems run cooperating processes together in one container under a real init:
- Spilo (Zalando): Patroni + PostgreSQL bundled in one container, supervised by runit (
runsvdir -P /etc/service) - CloudNativePG (CNCF Sandbox, donated by EDB): operator-managed PostgreSQL with an in-container supervisor process
- Crunchy Data PGO (Crunchy Data, acquired by Snowflake in June 2025): operator-based PostgreSQL with HA and backup tooling
- Phusion baseimage-docker (9.1k★): the general-purpose runit +
/sbin/my_initimage behind the original "your container needs a real init system" argument
If your workload looks like one of theirs (a database pod, a search cluster, a stateful system with helper agents), runr is built for that case. systemd-compatible .service and .timer files, cgroup v2 pools, integrated syslog server, log rotation, on-demand tasks, journalctl-style inspection, single static binary.
- One stateless process per container (web servers, API gateways, queue workers, single-binary CLIs), or stock PostgreSQL/ClickHouse/Elasticsearch images without an operational layer around them. The standard Kubernetes model fits; runr only adds complexity.
- Init-only need: zombie reaping and SIGTERM forwarding for a single child. tini or
docker run --initis enough.
If neither matches, and your container looks like a stateful system surrounded by helper agents, keep reading.
# 1. Put the binary in the container image
COPY runr /usr/bin/runr
# 2. Create a service file
cat > /etc/runr/my-app.service << 'EOF'
[Service]
ExecStart=/usr/bin/my-app --config /etc/my-app/config.toml
Restart=always
RestartSec=5s
[Log]
Directory=/var/log/my-app
Sink=stdout
EOF
# 3. Run as PID 1
ENTRYPOINT ["/usr/bin/runr", "supervisor"]# Control from inside the container
runr status # all services
runr log my-app -f # follow logs
runr restart my-app # restart
runr daemon-reload # pick up new/changed .service filesrunr listens on 127.0.0.1:8010 by default. The API has no authentication — anyone who can reach the port can start/stop services, run arbitrary commands via background run, and shut down the supervisor.
Inside a standard Kubernetes pod or Docker container with bridge networking, 127.0.0.1 is only reachable from within the container. The risk appears when:
docker run --network host— container shares the host network stack, port 8010 is reachable from other hosts--http-listen-api 0.0.0.0:8010— explicitly binds to all interfaceshostNetwork: truein a Kubernetes pod spec
In these cases, any process on the host (or on the network, if no firewall) can execute commands inside the container through the API.
If you need the API accessible outside the container, restrict access at the network level (firewall rules, NetworkPolicy in Kubernetes, or a reverse proxy with authentication).
The space is well-explored; runr fills a specific gap.
- supervisord: Python runtime in the image, no timers, no forking-daemon support (no
PIDFile), no cgroup integration. Predates cgroup v2. - runit: minimal and reliable, but requires shell-script unit files, no timers, no on-demand tasks, no log inspection CLI.
- s6 / s6-overlay: same shell-script burden, no calendar timers, no cgroup management, no forking daemons. Excellent as a general-purpose base-image init; lighter on database-specific features.
- systemd: the right abstraction (declarative
.servicefiles, calendar timers,journalctl, restart semantics) but too heavy for a container: D-Bus, large dependency surface.
runr takes systemd's configuration format and lifecycle model and packages them for the container case: single static binary, cgroup v2 pools, integrated syslog server, log rotation, on-demand background tasks, journalctl-style inspection.
| Feature | runit | s6 / s6-overlay | supervisord | systemd | runr |
|---|---|---|---|---|---|
| Timers / Cron | - | - | - | + | + |
| On-Demand Tasks | - | - | - | + | + |
| Shared Cgroups v2 | - | - | - | + | + |
| Syslog Server + Rotation | - | partial (s6-socklog) | - | + (journald) | + |
| Log Rotation | + (svlogd) | + (s6-log) | - | + (journald) | + |
| Log Inspection (follow/tail) | - | - | partial | + (journalctl) | + |
| Forking Daemons (PIDFile) | - | - | - | + | + |
| Dynamic Reload | - | - | partial | + | + |
| KillMode (control-group/mixed/process/none) | - | - | - | + | + |
| PID 1 Zombie Reaping | + | + | - | + | + |
| Service Dependencies | - | + (s6-rc) | - | + | - |
| HTTP API | - | - | partial (XML-RPC) | - (D-Bus) | + |
| Declarative Config (.service/.timer) | - | - | + (INI) | + | + |
| systemctl/journalctl Compat | - | - | - | native | + |
| Resource Overhead | minimal | minimal | medium | heavy | minimal |
| Container Image Impact | none | ~1MB | Python runtime | systemd + deps | none |
PostgreSQL, MySQL, and most traditional databases fork on startup. The postmaster forks into background, writes its PID to a file, and the parent exits. supervisord, runit, and s6 see the parent exit and think the service crashed.
runr tracks the forked daemon via Type=forking + PIDFile. After the parent exits, it reads the PID file, verifies the daemon is alive via /proc/<pid>, and monitors it with PID reuse detection (comparing /proc/<pid>/cmdline snapshots). On daemon exit, restart policy kicks in.
For services that fork without writing PID files, runr tracks the process group (PGID). The service is alive as long as at least one process in the group exists.
When Kubernetes sends SIGTERM to the pod, you have terminationGracePeriodSeconds to shut down cleanly. For a database that means: stop accepting connections, drain active queries, flush WAL, checkpoint, exit. 30 seconds for a quiet instance, 5 minutes under heavy write load.
Shutdown sequence:
- Run
ExecStopif configured - Send SIGTERM to process/group (respecting
KillMode) - Wait up to
TimeoutStopSecfor the process to exit - If still alive, escalate to SIGKILL
KillMode controls signal scope:
control-group(default) — SIGTERM to the entire process groupmixed— SIGTERM to main process, SIGKILL to group on timeoutprocess— signal only the main process, leave children alonenone— skip SIGTERM, only run ExecStop; safety SIGKILL as last resort
$MAINPID, $PGID, $LAST_PID are available in ExecStop.
Backups, vacuum, statistics refresh, log cleanup, partition management. On a VM that's cron. In a container there is no cron.
# /etc/runr/backup.timer
[Timer]
OnCalendar=*-*-* 02:00:00
RandomizedDelaySec=600Trigger types:
OnCalendar— calendar expressions (Mon..Fri 03:00,*-*-* *:0/5,hourly,daily)OnStartupSec— fire once after runr startsOnUnitInactiveSec— fire after the target service finishes (repeating, paired withOnStartupSec)RandomizedDelaySec— jitter to prevent thundering herd
If the previous run is still going when the next trigger fires, it's skipped.
In the sidecar model, each process gets its own container with separate cgroup limits. If PostgreSQL needs 3.8GB of your 4GB pod during a heavy query, but the exporter's cgroup caps it at 256MB, the exporter OOM-kills even though the pod has headroom.
runr puts multiple services into a shared cgroup v2 pool. PostgreSQL uses the headroom when the exporter idles; the exporter borrows from idle capacity during metric scrapes. One cgroup instead of three, less kubelet overhead.
I/O limits work with any block device type: device-mapper, RBD (Ceph), NVMe, SCSI. Device major:minor numbers are resolved automatically.
pg_basebackup, pg_dump, REINDEX, schema migrations, data exports. With runr:
runr background run -- /usr/local/bin/migrate-db --target 42
runr background run --env PGDATABASE=analytics --kill-timeout 600 \
-- pg_dump -Fc -f /backup/analytics.dump
runr background list
runr background log a1b2c3Background services are identified by UUID, persist state to disk (survive runr restarts), and stop via SIGTERM → timeout → SIGKILL. systemd-run compatibility mode also works.
PostgreSQL logs to a file, pg_doorman logs to syslog, the exporter logs to stderr. Without a log collector, you need logrotate (requires cron), rsyslog (requires a daemon), and manual cleanup scripts.
Managed process logs with automatic rotation:
[Log]
Directory=/var/log/postgresql
FileSize=100M
FileCount=7
RotateEvery=24h
Sink=stdout
Prefix=[%T %s pid=%p]Rotation by size and time, gzip compression, configurable retention. Line prefixes with dynamic placeholders: %T (timestamp), %s (service name), %p (PID), %R (restart count), %S (stdout/stderr), %d (date), %I (ISO 8601), %U (unix epoch). Services continue running when disk fills up — log writes are discarded, the process isn't killed.
Integrated syslog server for legacy applications:
# /etc/runr/syslog.conf
[server]
listen = /dev/log
[pg_doorman]
appname = pg_doorman
directory = /var/log/syslog/pg_doorman
max_size = 50M
max_count = 10Applications that only speak syslog work without rsyslog, logrotate, or cron in the image.
Log inspection:
runr log postgresql -n 50 # last 50 lines
runr log postgresql -f # follow (tail -f)
journalctl -u postgresql -f # systemd compatibility modeWhen runr is PID 1, it handles what the kernel expects from init:
- Zombie reaping. Calls
waitpid(-1, WNOHANG)every 200ms. Prevents zombie accumulation from PostgreSQL backends orphaned during connection termination. - Signal forwarding. SIGTERM from kubelet triggers per-service shutdown sequences instead of letting the kernel SIGKILL everything after the grace period.
- Cgroup v2 initialization. Creates
/sys/fs/cgroup/runr, moves itself there, enablescpu,io,memorycontrollers. Services can override placement withCgroup=. - Subreaper. Sets
PR_SET_CHILD_SUBREAPERso orphaned grandchild processes are reparented to runr.
cat > /etc/runr/new-exporter.service << 'EOF'
[Service]
ExecStart=/usr/bin/node_exporter
Restart=always
Autostart=yes
EOF
runr daemon-reload # new service detected and started, existing services untouchedHot-reload applies changes at different lifecycle points:
| When applied | Fields |
|---|---|
| Next start | ExecStart, ExecStartPre, WorkingDirectory, User, Group, Environment, Nice, LimitNOFILE, CapabilityBoundingSet, TimeoutStartSec, PIDFile |
| Next stop | KillMode, ExecStop, TimeoutStopSec |
| Next reload | ExecReload |
| Next exit/crash | Restart, RestartSec |
| Immediately | MaxMemoryRSS |
| Requires restart | Type (different actor) |
Services in Failed state with Restart=always or on-failure get restarted automatically on reload.
Create symlinks to activate drop-in replacement:
ln -s /usr/bin/runr /usr/local/bin/systemctl
ln -s /usr/bin/runr /usr/local/bin/journalctl
ln -s /usr/bin/runr /usr/local/bin/systemd-runrunr detects the binary name at startup and switches CLI parsing:
- systemctl
status|start|stop|reload|kill|show|cat|daemon-reload <unit> - journalctl
-u <unit> [-n N] [-f] [-e](multiple-ufor merged output) - systemd-run
[--unit <name>] [--wait] [-q] -- <command>
Existing scripts and Ansible playbooks work without changes.
A PostgreSQL pod, sketched:
Container (PID 1 = runr)
├── postgresql Type=forking, PIDFile=/var/run/postgresql/postmaster.pid
├── pg_exporter Type=simple, Restart=always
├── pg_doorman Type=simple, Cgroup=infra
├── wal-g-backup.timer OnCalendar=*-*-* 02:00:00
├── pg_stat_monitor.timer OnUnitInactiveSec=5min
└── [on-demand: runr background run -- pg_basebackup ...]
# /etc/runr/postgresql.service
[Service]
Type=forking
User=postgres
Group=postgres
WorkingDirectory=~
ExecStartPre=-/usr/local/bin/pg-preflight-check.sh
ExecStart=/usr/bin/pg_ctl start -D /pgdata -l /dev/null
ExecStop=/usr/bin/pg_ctl stop -D /pgdata -m fast
ExecReload=/usr/bin/pg_ctl reload -D /pgdata
PIDFile=/var/run/postgresql/postmaster.pid
TimeoutStartSec=120s
TimeoutStopSec=300s
Restart=on-failure
RestartSec=10s
Cgroup=/sys/fs/cgroup/pg-pool/cgroup.procs
[Log]
Directory=/var/log/postgresql
Sink=stdout
Prefix=[%T postgresql]
FileSize=100M
RotateEvery=24h# /etc/runr/pg-exporter.service
[Service]
User=postgres
EnvironmentFile=/etc/pg_exporter/env
ExecStart=/usr/bin/postgres_exporter --web.listen-address=:9187
Restart=always
RestartSec=5s
Cgroup=/sys/fs/cgroup/pg-pool/cgroup.procs
[Log]
Sink=stdout
Prefix=[%T pg-exporter]# /etc/runr/backup.service
[Service]
Type=oneshot
User=postgres
ExecStart=/usr/local/bin/wal-g backup-push /pgdata
Autostart=no
Restart=no
TimeoutStartSec=7200s
[Log]
Directory=/var/log/backup
Prefix=[%T backup]
# /etc/runr/backup.timer
[Timer]
OnCalendar=*-*-* 02:00:00
RandomizedDelaySec=600# /etc/runr/pg-pool.cgroup
[Cgroup]
Name=pg-pool
MemoryMax=8G
CpuMax=400%
IOMax=/pgdata write_bps:200M read_bps:500M# /etc/runr/clickhouse.service
[Service]
Type=forking
User=clickhouse
ExecStart=/usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml --daemon
ExecStop=/bin/kill -TERM $MAINPID
PIDFile=/var/run/clickhouse-server/clickhouse-server.pid
TimeoutStopSec=120s
Restart=on-failure
MaxMemoryRSS=16G
LimitNOFILE=262144
[Log]
Directory=/var/log/clickhouse-server
FileSize=200M
FileCount=10# /etc/runr/ch-keeper.service
[Service]
Type=forking
User=clickhouse
ExecStart=/usr/bin/clickhouse-keeper --config-file=/etc/clickhouse-keeper/keeper-config.xml --daemon
PIDFile=/var/run/clickhouse-keeper/clickhouse-keeper.pid
Restart=always
[Log]
Directory=/var/log/clickhouse-keeper# /etc/runr/cleanup.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/ch-cleanup-old-parts.sh
Autostart=no
Restart=no
# /etc/runr/cleanup.timer
[Timer]
OnStartupSec=1h
OnUnitInactiveSec=6h[Unit]
Description=PostgreSQL Database Server
[Service]
# Service type
Type=simple|forking|oneshot # default: simple
# Execution
ExecStart=/usr/bin/postgres -D /pgdata
ExecStartPre=/usr/bin/check-disk-space.sh # pre-start hook (prefix with - to ignore failure)
ExecStop=/usr/bin/pg_ctl stop -D /pgdata # custom stop command
ExecReload=/usr/bin/pg_ctl reload -D /pgdata
# Identity
User=postgres
Group=postgres
WorkingDirectory=/var/lib/postgresql
# Restart policy
Restart=no|on-failure|always|halt # default: always
RestartSec=5s # delay between restarts (default: 2s)
# Timeouts
TimeoutStartSec=90s # max startup time (default: 90s)
TimeoutStopSec=300s # max shutdown time (default: 90s)
TimeoutSec=90s # shorthand for both
# Process control
KillMode=control-group|mixed|process|none # signal scope (default: control-group)
PIDFile=/var/run/postgresql/postmaster.pid # for Type=forking
Autostart=yes|no # start on daemon-reload (default: yes)
# Resource limits
MaxMemoryRSS=2G # software OOM: kill if RSS exceeds limit
LimitNOFILE=65536 # max open file descriptors
Nice=5 # process priority (-20..19)
CapabilityBoundingSet=CAP_CHOWN CAP_KILL # Linux capabilities to keep
# Environment
Environment=PGDATA=/pgdata
Environment=PGPORT=5432
EnvironmentFile=/etc/postgresql/env # load from file
EnvironmentFile=-/etc/postgresql/env.local # prefix - = optional (ignore if missing)
# Cgroup
Cgroup=/sys/fs/cgroup/pg-pool/cgroup.procs # join shared cgroup
# (alias: CgroupProcPidsFile)
[Log]
Directory=/var/log/postgresql
Sink=stdout|none # mirror to supervisor stdout (default: none)
FileSize=100M # rotate at size (default: 100M)
FileCount=7 # keep N rotated files (default: 7)
RotateEvery=24h # rotate by time (default: 24h)
Prefix=[%T %s] # line prefix for all output
PrefixSink=[%T %s pid=%p] # override prefix for stdout/stderr mirror
PrefixFile=[%d %T] # override prefix for log filePrefix placeholders: %s (service name), %T (HH:MM:SS.mmm), %d (YYYY-MM-DD), %I (ISO 8601), %U (unix epoch.ms), %S (O=stdout, E=stderr), %p (PID), %R (restart count), %% (literal %)
[Timer]
OnCalendar=*-*-* 02:00:00 # calendar expression
OnStartupSec=30s # fire once after runr starts
OnUnitInactiveSec=5min # fire after target service stops (requires OnStartupSec)
RandomizedDelaySec=600 # jitter to prevent thundering herd
Unit=backup # target service (default: timer name)
Autostart=yes|no # auto-start timer (default: yes)Calendar syntax: Mon..Fri 03:00, *-*-* *:0/5 (every 5 min), *-*-1..7 18:00, hourly, daily, weekly, monthly, yearly
Rules:
- At least one of
OnCalendarorOnStartupSecrequired OnUnitInactiveSecrequiresOnStartupSec, cannot be combined withOnCalendar- Target service for
OnUnitInactiveSecshould beType=oneshotwithRestart=no(recommended, not enforced) - Double-run prevention: if target service is still running, the trigger is skipped
[Cgroup]
Name=pg-pool # creates /sys/fs/cgroup/pg-pool
# (or Path=/sys/fs/cgroup/custom-name for explicit path)
MemoryMax=4G # memory.max
CpuMax=200% # cpu.max (200% = 2 cores)
IOMax=/pgdata read_bps:500M write_bps:200M read_iops:10000 write_iops:5000[server]
listen = /dev/log # unix socket path
[default]
directory = /var/log/syslog/default
max_size = 10M
max_count = 7
max_age = 168h
[pg_doorman]
appname = pg_doorman # route by syslog appname
directory = /var/log/syslog/pg_doorman
max_size = 50M
max_count = 10
max_age = 720h| Policy | Behavior |
|---|---|
no |
Never restart. Service transitions to Stopped. |
on-failure |
Restart if exit code != 0 or killed by signal (except SIGTERM). |
always |
Restart on any exit. |
halt |
On crash (non-zero exit or signal): halt the supervisor and all services. On clean exit (code 0): transition to Stopped without halt. |
Restart=halt is for critical services where local recovery is impossible. If PostgreSQL's postmaster can't start (corrupted pg_control, missing data directory), the pod needs a fresh start on a different node.
# Service control
runr start|stop|restart|reload <name>
runr kill -s TERM|KILL|HUP|INT <name>
runr enable|disable [--now] <name>
# Status
runr status [<name>]
runr list-services [--state running,failed]
runr list-timers
runr list-units [--type service|timer]
runr is-active|is-failed|is-enabled <name>
# Logs
runr log <name> [-n 50] [-f]
# Configuration
runr daemon-reload # reload all unit files from disk
runr cat <name> # show unit file contents
runr show <name> # show properties (Key=Value format)
# Background on-demand tasks
runr background run [--env K=V] [--kill-timeout 120] -- <command>
runr background list|status|log|stop|remove <uuid>
# Daemon
runr info # version, PID, memory, CPU, uptime
runr healthz / readyz # probe endpoints
# Completions
runr completion bash|zsh|fish--json— machine-readable output--no-header— suppress table headers (for scripting)--quiet— suppress all output--color auto|always|never
# glibc (default, requires matching glibc at runtime)
cargo build --release
# musl (fully static, runs on any Linux)
cargo build --release --target x86_64-unknown-linux-muslmusl build produces a statically linked binary with no runtime dependencies. Copy it into any Linux container image — Alpine, Debian, Ubuntu, Fedora, scratch — and it works. No glibc version matching, no shared library hunt.
Release profile: codegen-units = 1, LTO, opt-level = "s", symbols stripped, panic = "abort". Uses jemalloc as global allocator.
glibc:
FROM rust:slim-bookworm AS builder
RUN cargo build --release
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/runr /usr/bin/runr
ENTRYPOINT ["/usr/bin/runr", "supervisor"]
musl (static):
FROM rust:slim-bookworm AS builder
RUN rustup target add x86_64-unknown-linux-musl && \
cargo build --release --target x86_64-unknown-linux-musl
FROM alpine:3.20
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/runr /usr/bin/runr
ENTRYPOINT ["/usr/bin/runr", "supervisor"]
BDD tests with Cucumber covering 47 feature files:
make cucumber # all BDD tests locally
make cucumber FEATURE=cli.feature # specific feature
make cucumber TAGS=@smoke # by tagTags: @linux-only (skipped on macOS), @pid1-only (Docker-only), @smoke, @critical