@@ -1618,6 +1618,60 @@ last reboot
16181618uptime
16191619```
16201620
1621+ ### K3s certificate expiry: ` kubectl ` works but controllers do not reconcile
1622+
1623+ If ` kubectl ` can still read and patch objects but rollouts never progress, check
1624+ whether K3s component certificates expired on the control-plane nodes. One
1625+ common symptom is a Deployment whose spec was accepted, but whose
1626+ ` observedGeneration ` never catches up:
1627+
1628+ ``` sh
1629+ kubectl get deploy < name> -o jsonpath=' generation={.metadata.generation} observed={.status.observedGeneration} updated={.status.updatedReplicas} ready={.status.readyReplicas} replicas={.status.replicas}{"\n"}'
1630+ ```
1631+
1632+ Other useful checks:
1633+
1634+ ``` sh
1635+ kubectl get events --sort-by=.lastTimestamp | grep CertificateExpirationWarning
1636+
1637+ ssh root@< control-plane-ip> -i /path/to/private_key -o IdentitiesOnly=yes
1638+ k3s certificate check --output table
1639+ journalctl -u k3s -n 100 --no-pager | grep -E ' certificate has expired|tls: bad certificate|leaderelection'
1640+ ```
1641+
1642+ Typical log lines include `tls: failed to verify certificate: x509: certificate
1643+ has expired` from ` leaderelection.go`, or etcd peer messages such as
1644+ ` remote error: tls: bad certificate ` . In that state, the API server may still
1645+ answer some requests, while the scheduler/controller-manager/etcd leadership
1646+ path is unhealthy.
1647+
1648+ K3s renews expired or near-expiry leaf certificates on service startup. Restart
1649+ the control-plane nodes one at a time, wait for each node to return, and restart
1650+ the node used by your current kubeconfig endpoint last if possible. Then check
1651+ that ` k3s certificate check --output table ` no longer shows expired leaf
1652+ certificates. ` WARNING ` rows for certs that are near expiry can remain; ` EXPIRED `
1653+ rows should be gone.
1654+
1655+ ``` sh
1656+ for host in < control-plane-ip-1> < control-plane-ip-2> < control-plane-ip-3> ; do
1657+ ssh root@" ${host} " -i /path/to/private_key -o IdentitiesOnly=yes \
1658+ ' systemctl restart k3s'
1659+ ssh root@" ${host} " -i /path/to/private_key -o IdentitiesOnly=yes \
1660+ ' systemctl is-active k3s && k3s certificate check --output table | grep -E "EXPIRED|WARNING" || true'
1661+ done
1662+ ```
1663+
1664+ If automatic renewal on restart is not enough, use the
1665+ [ K3s manual rotation flow] ( https://docs.k3s.io/cli/certificate ) on each server
1666+ (` systemctl stop k3s ` , ` k3s certificate rotate ` , ` systemctl start k3s ` ). Rotate
1667+ servers first, then agents. After the controller manager observes the pending
1668+ Deployment generation, rerun the rollout check:
1669+
1670+ ``` sh
1671+ kubectl rollout status deploy/< name> --timeout=300s
1672+ kubectl get pods -l app=< label> -o wide
1673+ ```
1674+
16211675---
16221676
16231677## 💣 Takedown
0 commit comments