Hi folks,
I'm running different clusters, and suddenly all my kubernetes machines, workers and control planes, have lost connection to the outside world. I have a NAT gateway and a wireguard running in the projects. The NAT GW and the wireguard machine work fine, but the k8s clusters all say:
# ping google.de
ping: google.de: Temporary failure in name resolution
# ping 185.12.64.1
ping: connect: Network is unreachable
Anybody else experiencing the same? It happened all of a sudden about 9 hours ago.
I haven't found a way to get them back online yet Adding back a default route brought the connectivity back, temporarily 🤷♂️
Fortunately the services are still online, but when running terraform, I get:
module.kube.module.kube.null_resource.kustomization (remote-exec): + kubectl delete --ignore-not-found -n kube-system helmchart.helm.cattle.io/hcloud-cloud-controller-manager
module.kube.module.kube.null_resource.kustomization (remote-exec): + kubectl apply -k /var/post_install
module.kube.module.kube.null_resource.kustomization (remote-exec): error: accumulating resources: accumulation err='accumulating resources from 'https://github.com/kubereboot/kured/releases/download/1.17.1/kured-1.17.1-dockerhub.yaml': Get "https://github.com/kubereboot/kured/releases/download/1.17.1/kured-1.17.1-dockerhub.yaml": dial tcp: lookup github.com: Try again': failed to run '/usr/bin/git fetch --depth=1 https://github.com/kubereboot/kured HEAD': fatal: unable to access 'https://github.com/kubereboot/kured/': Could not resolve host: github.com
module.kube.module.kube.null_resource.kustomization (remote-exec): : exit status 128
╷
│ Error: remote-exec provisioner error
│
│ with module.kube.module.kube.null_resource.kustomization,
│ on ../../../../terraform-modules/terraform-hcloud-kube-hetzner/init.tf line 405, in resource "null_resource" "kustomization":
│ 405: provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_756741801.sh": Process exited with status 1
edit
Apparently the default route vanished from all the machines:
# ip r s
10.0.0.0/8 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.5 metric 100
10.0.0.1 dev eth1 proto dhcp scope link src 10.127.128.5 metric 100
169.254.169.254 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.5 metric 100
# ip r add default via 10.0.0.1 dev eth1
# ping -c1 google.de
PING google.de (142.250.184.227) 56(84) bytes of data.
64 bytes from fra24s12-in-f3.1e100.net (142.250.184.227): icmp_seq=1 ttl=114 time=32.9 ms
--- google.de ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 32.869/32.869/32.869/0.000 ms
However, this is not reboot safe. After a reboot the default route is gone again, and node is stuck without internet access 🤷♂️
Hi folks,
I'm running different clusters, and suddenly all my kubernetes machines, workers and control planes, have lost connection to the outside world. I have a NAT gateway and a wireguard running in the projects. The NAT GW and the wireguard machine work fine, but the k8s clusters all say:
Anybody else experiencing the same? It happened all of a sudden about 9 hours ago.
I haven't found a way to get them back online yetAdding back a default route brought the connectivity back, temporarily 🤷♂️Fortunately the services are still online, but when running terraform, I get:
edit
Apparently the default route vanished from all the machines:
However, this is not reboot safe. After a reboot the default route is gone again, and node is stuck without internet access 🤷♂️