Wait for system deployments and jobs on init#1948
Conversation
Summary of ChangesHello @vsalomaki, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the Kubernetes cluster initialization process. By integrating explicit wait commands for all system deployments and jobs, it ensures that foundational components are fully operational before proceeding with further configurations. This change directly mitigates potential installation failures caused by race conditions with slow-starting services, leading to a more stable and reliable cluster setup. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces wait steps to ensure system deployments and jobs are fully initialized before proceeding with cluster setup. This is a valuable addition for improving the robustness of the initialization process. My review includes a suggestion to increase the timeout for these waits to better handle slower environments and prevent potential race conditions, making the process more reliable.
7c352b1 to
5c54fb7
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces waits for system deployments and jobs to complete during cluster initialization, which is a great improvement for the stability and reliability of the setup process. The provided logs clearly demonstrate the effectiveness of this change. I have one suggestion to make the wait condition even more robust by also considering DaemonSet resources, which are used for critical components like the CNI.
5c54fb7 to
8991a83
Compare
ef7026e
into
mysticaltech:fix/staging-review-2026-01-11
Description
Adds wait-steps to wait for system deployments and jobs on init. Some of the system-Helms, for example cert-manager and longhorn are a bit slow to start completely and subsequent manifest installations may fail if these expected system helms are not yet fully installed.
To inspect I ran a
kubectl get pods,deployments,jobs -Acommand right after this line:https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner/blob/ee1b6badba4223ff6dc7f619e52aee7f822db48c/init.tf#L488
Before waits
Looking at the output below, certain deployments and jobs were still in different states of initialization. (List of Pods is just for reference here and other types were excluded from this extract)
After wait
Then waiting for the deployments and jobs as described in this PR and running the same
kubectl get all -Aright after those waits yields: