Machines lost network connection

Hi folks,

I'm running different clusters, and suddenly all my kubernetes machines, workers and control planes, have lost connection to the outside world. I have a NAT gateway and a wireguard running in the projects. The NAT GW and the wireguard machine work fine, but the k8s clusters all say:

```
# ping google.de
ping: google.de: Temporary failure in name resolution
# ping 185.12.64.1
ping: connect: Network is unreachable
```

Anybody else experiencing the same? It happened all of a sudden about 9 hours ago. 
~I haven't found a way to get them back online yet~ Adding back a default route brought the connectivity back, temporarily 🤷‍♂️ 

Fortunately the services are still online, but when running terraform, I get:

```
module.kube.module.kube.null_resource.kustomization (remote-exec): + kubectl delete --ignore-not-found -n kube-system helmchart.helm.cattle.io/hcloud-cloud-controller-manager
module.kube.module.kube.null_resource.kustomization (remote-exec): + kubectl apply -k /var/post_install
module.kube.module.kube.null_resource.kustomization (remote-exec): error: accumulating resources: accumulation err='accumulating resources from 'https://github.com/kubereboot/kured/releases/download/1.17.1/kured-1.17.1-dockerhub.yaml': Get "https://github.com/kubereboot/kured/releases/download/1.17.1/kured-1.17.1-dockerhub.yaml": dial tcp: lookup github.com: Try again': failed to run '/usr/bin/git fetch --depth=1 https://github.com/kubereboot/kured HEAD': fatal: unable to access 'https://github.com/kubereboot/kured/': Could not resolve host: github.com
module.kube.module.kube.null_resource.kustomization (remote-exec): : exit status 128
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube.module.kube.null_resource.kustomization,
│   on ../../../../terraform-modules/terraform-hcloud-kube-hetzner/init.tf line 405, in resource "null_resource" "kustomization":
│  405:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_756741801.sh": Process exited with status 1
```

*edit*

Apparently the default route vanished from all the machines:

```
# ip r s
10.0.0.0/8 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.5 metric 100 
10.0.0.1 dev eth1 proto dhcp scope link src 10.127.128.5 metric 100 
169.254.169.254 via 10.0.0.1 dev eth1 proto dhcp src 10.127.128.5 metric 100

# ip r add default via 10.0.0.1 dev eth1

# ping -c1 google.de
PING google.de (142.250.184.227) 56(84) bytes of data.
64 bytes from fra24s12-in-f3.1e100.net (142.250.184.227): icmp_seq=1 ttl=114 time=32.9 ms

--- google.de ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 32.869/32.869/32.869/0.000 ms
```

However, this is not reboot safe. After a reboot the default route is gone again, and node is stuck without internet access 🤷‍♂️ 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Machines lost network connection #1873

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Machines lost network connection #1873

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions