Skip to content

Commit 810a884

Browse files
committed
docs: add k3s certificate expiry debugging note
1 parent e6c8b02 commit 810a884

1 file changed

Lines changed: 54 additions & 0 deletions

File tree

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1618,6 +1618,60 @@ last reboot
16181618
uptime
16191619
```
16201620

1621+
### K3s certificate expiry: `kubectl` works but controllers do not reconcile
1622+
1623+
If `kubectl` can still read and patch objects but rollouts never progress, check
1624+
whether K3s component certificates expired on the control-plane nodes. One
1625+
common symptom is a Deployment whose spec was accepted, but whose
1626+
`observedGeneration` never catches up:
1627+
1628+
```sh
1629+
kubectl get deploy <name> -o jsonpath='generation={.metadata.generation} observed={.status.observedGeneration} updated={.status.updatedReplicas} ready={.status.readyReplicas} replicas={.status.replicas}{"\n"}'
1630+
```
1631+
1632+
Other useful checks:
1633+
1634+
```sh
1635+
kubectl get events --sort-by=.lastTimestamp | grep CertificateExpirationWarning
1636+
1637+
ssh root@<control-plane-ip> -i /path/to/private_key -o IdentitiesOnly=yes
1638+
k3s certificate check --output table
1639+
journalctl -u k3s -n 100 --no-pager | grep -E 'certificate has expired|tls: bad certificate|leaderelection'
1640+
```
1641+
1642+
Typical log lines include `tls: failed to verify certificate: x509: certificate
1643+
has expired` from `leaderelection.go`, or etcd peer messages such as
1644+
`remote error: tls: bad certificate`. In that state, the API server may still
1645+
answer some requests, while the scheduler/controller-manager/etcd leadership
1646+
path is unhealthy.
1647+
1648+
K3s renews expired or near-expiry leaf certificates on service startup. Restart
1649+
the control-plane nodes one at a time, wait for each node to return, and restart
1650+
the node used by your current kubeconfig endpoint last if possible. Then check
1651+
that `k3s certificate check --output table` no longer shows expired leaf
1652+
certificates. `WARNING` rows for certs that are near expiry can remain; `EXPIRED`
1653+
rows should be gone.
1654+
1655+
```sh
1656+
for host in <control-plane-ip-1> <control-plane-ip-2> <control-plane-ip-3>; do
1657+
ssh root@"${host}" -i /path/to/private_key -o IdentitiesOnly=yes \
1658+
'systemctl restart k3s'
1659+
ssh root@"${host}" -i /path/to/private_key -o IdentitiesOnly=yes \
1660+
'systemctl is-active k3s && k3s certificate check --output table | grep -E "EXPIRED|WARNING" || true'
1661+
done
1662+
```
1663+
1664+
If automatic renewal on restart is not enough, use the
1665+
[K3s manual rotation flow](https://docs.k3s.io/cli/certificate) on each server
1666+
(`systemctl stop k3s`, `k3s certificate rotate`, `systemctl start k3s`). Rotate
1667+
servers first, then agents. After the controller manager observes the pending
1668+
Deployment generation, rerun the rollout check:
1669+
1670+
```sh
1671+
kubectl rollout status deploy/<name> --timeout=300s
1672+
kubectl get pods -l app=<label> -o wide
1673+
```
1674+
16211675
---
16221676

16231677
## 💣 Takedown

0 commit comments

Comments
 (0)