Skip to content

fix(t8s-cluster/workload-cluster): add missing tolerations#1933

Merged
cwrau merged 1 commit intomainfrom
fix/t8s-cluster/hcp-correct-cilium-toleration
Jan 26, 2026
Merged

fix(t8s-cluster/workload-cluster): add missing tolerations#1933
cwrau merged 1 commit intomainfrom
fix/t8s-cluster/hcp-correct-cilium-toleration

Conversation

@cwrau
Copy link
Member

@cwrau cwrau commented Jan 22, 2026

Summary by CodeRabbit

  • Improvements
    • Enhanced cluster networking component resilience by improving node scheduling tolerance handling during node initialization and readiness state transitions.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 22, 2026 17:47
@cwrau cwrau enabled auto-merge January 22, 2026 17:47
@gemini-code-assist
Copy link

Summary of Changes

Hello @cwrau, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the Cilium CNI deployment in the t8s-cluster workload cluster by adding essential tolerations. These additions ensure that Cilium pods can be successfully scheduled on nodes that may temporarily carry specific taints, preventing potential networking issues during node initialization or readiness transitions.

Highlights

  • Cilium CNI Tolerations: Added two new tolerations, node.cloudprovider.kubernetes.io/uninitialized and node.kubernetes.io/not-ready, to the Cilium CNI DaemonSet configuration within the workload cluster template.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Added two tolerations to the hosted control plane Cilium configuration: one for uninitialized cloud provider nodes and another for not-ready nodes. These tolerations enable pod scheduling on nodes with corresponding taints.

Changes

Cohort / File(s) Summary
Cilium CNI Tolerations
charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml
Added two new tolerations under hosted control plane: node.cloudprovider.kubernetes.io/uninitialized (NoSchedule, value: "true") and node.kubernetes.io/not-ready (NoSchedule)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested labels

t8s-cluster

Suggested reviewers

  • marvinWolff
  • teutonet-bot
  • tasches

Poem

🐰 Cilium now bends to node misfortune's way,
Two tolerations bloom where taints held sway,
Uninitialized clouds and not-ready states—
No longer barriers at scheduling's gates! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding missing tolerations to the workload cluster configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds missing tolerations to the Cilium CNI operator configuration for hosted control plane workload clusters. The changes ensure that the Cilium operator can be scheduled on nodes that are uninitialized by the cloud provider or not yet ready, which is critical for cluster bootstrapping.

Changes:

  • Added two new tolerations to the Cilium operator: one for node.cloudprovider.kubernetes.io/uninitialized and one for node.kubernetes.io/not-ready

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds tolerations for Cilium to handle uninitialized and not-ready nodes in a hosted control plane setup. The intent is correct and necessary for cluster stability. However, there are a couple of issues with the implementation. First, the tolerations for the Cilium operator are placed incorrectly in the Helm values, which will cause them to be ignored. This is a critical issue that needs to be fixed. Second, other essential Cilium components like the agent, Hubble Relay, and Hubble UI also require these tolerations to function correctly during cluster bootstrap and node lifecycle events, but they are missing. I've provided detailed comments on how to address both of these points.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@charts/t8s-cluster/templates/workload-cluster/cni-cilium.yaml`:
- Around line 103-104: The toleration for the taint key
"node.kubernetes.io/not-ready" is using effect "NoSchedule" but should use
"NoExecute"; update the toleration entry for key node.kubernetes.io/not-ready in
the Cilium workload template (the toleration block where key:
node.kubernetes.io/not-ready is defined) to set effect: NoExecute so pods that
don't tolerate that taint will be evicted as expected; keep any existing
tolerationSeconds or operator fields intact when making this change.

@cwrau cwrau added this pull request to the merge queue Jan 26, 2026
Merged via the queue into main with commit 8718c0d Jan 26, 2026
39 checks passed
@cwrau cwrau deleted the fix/t8s-cluster/hcp-correct-cilium-toleration branch January 26, 2026 08:24
github-merge-queue bot pushed a commit that referenced this pull request Feb 25, 2026
🤖 I have created a release *beep* *boop*
---


##
[9.6.0](t8s-cluster-v9.5.2...t8s-cluster-v9.6.0)
(2026-02-25)


### Features

* **t8s-cluster/management-cluster:** add `compute-plane` role label to
nodes
([#1953](#1953))
([f5897e3](f5897e3))
* **t8s-cluster/management-cluster:** add quotas fied to cluster
([#1998](#1998))
([3123514](3123514))
* **t8s-cluster/management-cluster:** ignore local storage for
autoscaler
([#1973](#1973))
([d4abff8](d4abff8))


### Bug Fixes

* **t8s-cluster/management-cluster:** add missing securityGroupRule for
cilium hubble
([#1971](#1971))
([f36f231](f36f231))
* **t8s-cluster/management-cluster:** adjust test helmRepositories
([#1964](#1964))
([66be444](66be444))
* **t8s-cluster/management-cluster:** remove hardcoded field
([#1984](#1984))
([227af97](227af97))
* **t8s-cluster/workload-cluster:** add missing tolerations
([#1933](#1933))
([8718c0d](8718c0d))
* **t8s-cluster/workload-cluster:** correctly set extraArgs value
([#2003](#2003))
([6e558f0](6e558f0))
* **t8s-cluster/workload-cluster:** migrate extraArgs type to string
([#1985](#1985))
([79d6df5](79d6df5))


### Miscellaneous Chores

* **t8s-cluster/dependencies:** update common docker tag to v1.8.0
([#1940](#1940))
([cdf387f](cdf387f))
* **t8s-cluster/dependencies:** update helm release cilium to v1.19.0
([#1969](#1969))
([e95a9a5](e95a9a5))
* **t8s-cluster/dependencies:** update helm release cluster-autoscaler
to v9.55.0
([#1963](#1963))
([659a529](659a529))
* **t8s-cluster/dependencies:** update helm release openstack-cinder-csi
to v2.35.0
([#1970](#1970))
([741d5b2](741d5b2))
* **t8s-cluster/dependencies:** update helm release
openstack-cloud-controller-manager to v2.35.0
([#1941](#1941))
([a580142](a580142))
* **t8s-cluster/dependencies:** update registry.k8s.io/etcd docker tag
to v3.6.8
([#1996](#1996))
([aa7d054](aa7d054))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants