Skip to content

fix: add tolerations to Cilium Operator for cloud provider initialization taint#1882

Merged
mysticaltech merged 1 commit into
masterfrom
fix/issue-1879-cilium-operator-tolerations
Aug 28, 2025
Merged

fix: add tolerations to Cilium Operator for cloud provider initialization taint#1882
mysticaltech merged 1 commit into
masterfrom
fix/issue-1879-cilium-operator-tolerations

Conversation

@mysticaltech
Copy link
Copy Markdown
Owner

Summary

  • Adds toleration for node.cloudprovider.kubernetes.io/uninitialized taint to Cilium Operator
  • Ensures operator can schedule during cluster initialization when nodes are temporarily tainted
  • Simple, safe, and backward-compatible fix

Issue

Fixes #1879

Problem

During initial cluster creation, the Cilium Operator pod remains unscheduled due to the node.cloudprovider.kubernetes.io/uninitialized taint applied by the cloud controller manager. This taint prevents normal pods from scheduling until cloud provider initialization completes, but the Cilium Operator needs to run during this bootstrap period.

Solution

Added default tolerations to the Cilium Operator via Helm values in the default cilium_values configuration. The toleration uses operator: Exists to handle any value the taint might have, ensuring robust scheduling across different environments.

operator:
  tolerations:
    - key: node.cloudprovider.kubernetes.io/uninitialized
      operator: Exists
      effect: NoSchedule

Why This Approach

  • Safe default: Tolerating a taint that may not exist is harmless and only affects scheduling during early bootstrap
  • Backward compatible: Doesn't change behavior on clusters without this taint; users can still override via custom cilium_values
  • Simple: Uses Cilium Helm chart's existing operator.tolerations pattern without introducing special-case logic
  • Minimal: Avoids adding unnecessary variables or complexity

Test Plan

  • Reviewed code changes
  • Ran terraform fmt to ensure proper formatting
  • Ran terraform plan to verify no breaking changes
  • Test deployment on new cluster to verify operator schedules correctly
  • Confirm no impact on existing clusters (toleration is harmless if taint doesn't exist)

Notes

  • This change only affects the default Cilium configuration
  • Users with custom cilium_values are unaffected unless they adopt this toleration
  • The toleration is specific to the cloud provider initialization phase and has no long-term effects

🤖 Generated with Claude Code

…tion taint

Adds toleration for node.cloudprovider.kubernetes.io/uninitialized taint to the Cilium Operator
deployment via Helm values. This ensures the operator can schedule during initial cluster creation
when nodes are temporarily tainted by the cloud controller manager initialization process.

The taint prevents normal pods from scheduling until the cloud provider completes initialization,
but the operator needs to run during this period. Adding this toleration is safe and backward
compatible - it only affects scheduling during bootstrap and doesn't force scheduling on tainted
nodes if others are available.

Fixes #1879

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings August 28, 2025 12:42
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @mysticaltech, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical scheduling issue for the Cilium Operator during Kubernetes cluster initialization. Previously, the operator would remain unscheduled due to the node.cloudprovider.kubernetes.io/uninitialized taint applied by cloud provider controllers. The core change involves adding a default toleration for this specific taint to the Cilium Operator's Helm values, allowing it to bypass the temporary taint and schedule successfully during the bootstrap phase. This fix is designed to be safe, backward-compatible, and minimally intrusive, ensuring the operator can run when needed without affecting existing clusters or custom configurations.

Highlights

  • Cilium Operator Scheduling Fix: Added a default toleration for the node.cloudprovider.kubernetes.io/uninitialized taint to the Cilium Operator's Helm values.
  • Cluster Initialization Compatibility: This change ensures the Cilium Operator can schedule and function correctly during the early stages of Kubernetes cluster initialization, even when nodes are temporarily tainted by the cloud provider.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a scheduling issue with the Cilium Operator during cluster initialization by adding tolerations for the cloud provider initialization taint. The fix ensures the operator can schedule on nodes that are temporarily tainted during the bootstrap process.

  • Adds toleration for node.cloudprovider.kubernetes.io/uninitialized taint to Cilium Operator
  • Uses the existing Helm chart pattern with operator.tolerations configuration
  • Provides a safe, backward-compatible default that doesn't affect existing deployments

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread locals.tf
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a toleration for the node.cloudprovider.kubernetes.io/uninitialized taint to the default Cilium Helm values. This is a well-thought-out fix that ensures the Cilium Operator can be scheduled during the initial cluster bootstrap phase, which is crucial for network readiness. The implementation is correct, using the standard operator.tolerations configuration for the Cilium chart. The change is safe, backward-compatible, and minimal. I have no further suggestions as the change is excellent.

@mysticaltech mysticaltech merged commit 7bfcf09 into master Aug 28, 2025
3 checks passed
@mysticaltech mysticaltech deleted the fix/issue-1879-cilium-operator-tolerations branch August 28, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cilium Operator missing tolerations for cloud provider initialization taint

2 participants