What did you do
-
How was the cluster created?
- As a part of CI build
k3d cluster create <cluster_name> --servers 1 --agents 2 --port 4212-4259:34212-34259@loadbalancer --volume <name>-data:/var/lib/rancher/k3s/storage@all --api-port 127.0.0.1:4210 --registry-config <path>/registry.yaml --no-rollback --timeout 3m (plus --k3s-arg for eviction/image-gc on each agent)
-
Cluster name is build-specific (e.g. a3d55b32e98571e3-sq-run).
-
What did you do afterwards?
- Nothing. Cluster create never succeeds: k3d waits for the loadbalancer (serverlb) to become ready; serverlb never reaches “start worker processes” and stays in a restart loop, then we hit the 3m timeout (or k3d fails earlier). No further k3d or docker commands; we then dump diagnostics (docker logs of serverlb, etc.) and run cleanup.
What did you expect to happen
Cluster create should complete: server and agent nodes start, serverlb becomes ready (nginx logs “start worker processes”), and k3d returns success so we can continue with our workflow.
What happened?
Cluster create fails because the serverlb (loadbalancer) container never becomes ready. It keeps restarting. From serverlb logs we see:
confd generates nginx config and overwrites /etc/nginx/nginx.conf.
confd runs nginx -s reload.
nginx fails with: invalid PID number "" in "/var/run/nginx.pid" (exit status 1).
So reload is being run when nginx has not yet started (or the pid file is empty), causing serverlb to fail and restart repeatedly.
Screenshots or terminal output
Relevant serverlb log (confd + nginx):
2026-02-11T17:06:57.470142717Z 2026-02-11T17:06:57Z k3d-<cluster>-serverlb confd[31]: DEBUG "nginx: the configuration file /etc/nginx/.nginx.conf074874422 syntax is ok\n...2026-02-11T17:06:57.470238857Z 2026-02-11T17:06:57Z k3d-<cluster>-serverlb confd[31]: DEBUG Overwriting target config /etc/nginx/nginx.conf2026-02-11T17:06:57.476714666Z 2026-02-11T17:06:57Z k3d-<cluster>-serverlb confd[31]: DEBUG Running /usr/sbin/nginx -s reload2026-02-11T17:06:57.505982212Z 2026-02-11T17:06:57Z k3d-<cluster>-serverlb confd[31]: ERROR "2026/02/11 17:06:57 [notice] 40#40: signal process started\n2026/02/11 17:06:57 [error] 40#40: invalid PID number \"\" in \"/var/run/nginx.pid\"\nnginx: [error] invalid PID number \"\" in \"/var/run/nginx.pid\"\n"2026-02-11T17:06:57.506070662Z 2026-02-11T17:06:57Z k3d-<cluster>-serverlb confd[31]: ERROR exit status 1
So: confd runs nginx -s reload before nginx has written a valid PID to /var/run/nginx.pid, which causes the reload to fail and the serverlb to crash/restart.
k3d CLI (typical):
k3d cluster create runs and eventually fails with something like: node k3d-...-serverlb failed to get ready / error waiting for log line start worker processes / node is in status=restarting. (You can paste the exact k3d stderr from your run if you have it.)
Which OS & Architecture
arch: x86_64
name: docker
os: Ubuntu 22.04.5 LTS
ostype: linux
version: 29.2.1
Which version of k3d
k3d version v5.8.3
k3s version v1.31.5-k3s1 (default)
Which version of docker
Client: Docker Engine - Community
Version: 29.2.1
API version: 1.53
OS/Arch: linux/amd64
Server: Docker Engine - Community
Engine:
Version: 29.2.1
API version: 1.53 (minimum version 1.24)
OS/Arch: linux/amd64
containerd:
Version: v2.2.1
runc:
Version: 1.3.4
docker info
Server Version: 29.2.1
Storage Driver: overlayfs
Cgroup Driver: systemd
Cgroup Version: 2
Kernel Version: 6.8.0-1044-aws
Operating System: Ubuntu 22.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 30.65GiB
What did you do
How was the cluster created?
k3d cluster create <cluster_name> --servers 1 --agents 2 --port 4212-4259:34212-34259@loadbalancer --volume <name>-data:/var/lib/rancher/k3s/storage@all --api-port 127.0.0.1:4210 --registry-config <path>/registry.yaml --no-rollback --timeout 3m (plus --k3s-arg for eviction/image-gc on each agent)Cluster name is build-specific (e.g. a3d55b32e98571e3-sq-run).
What did you do afterwards?
What did you expect to happen
Cluster create should complete: server and agent nodes start, serverlb becomes ready (nginx logs “start worker processes”), and k3d returns success so we can continue with our workflow.
What happened?
Cluster create fails because the serverlb (loadbalancer) container never becomes ready. It keeps restarting. From serverlb logs we see:
confd generates nginx config and overwrites /etc/nginx/nginx.conf.
confd runs nginx -s reload.
nginx fails with: invalid PID number "" in "/var/run/nginx.pid" (exit status 1).
So reload is being run when nginx has not yet started (or the pid file is empty), causing serverlb to fail and restart repeatedly.
Screenshots or terminal output
Relevant serverlb log (confd + nginx):
So: confd runs nginx -s reload before nginx has written a valid PID to /var/run/nginx.pid, which causes the reload to fail and the serverlb to crash/restart.
k3d CLI (typical):
k3d cluster create runs and eventually fails with something like: node k3d-...-serverlb failed to get ready / error waiting for log line start worker processes / node is in status=restarting. (You can paste the exact k3d stderr from your run if you have it.)
Which OS & Architecture
arch: x86_64
name: docker
os: Ubuntu 22.04.5 LTS
ostype: linux
version: 29.2.1
Which version of
k3dk3d version v5.8.3
k3s version v1.31.5-k3s1 (default)
Which version of docker
docker info