Fix targets registering twice on load balancers#1929
Conversation
Summary of ChangesHello @mrjoe7, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where load balancer targets were being registered redundantly. By refining the label selection logic for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue where load balancer targets are registered twice. The change modifies the label selector for the cluster load balancer target to only include control-plane nodes, preventing conflicts with the Hetzner Cloud Controller Manager (CCM) which manages agent node targets. While this is a good step, the fix is incomplete for cases where scheduling on control-plane nodes is enabled. I've suggested a more robust solution to conditionally add control-plane nodes as targets, which should fully resolve the duplication issue in all configurations.
| [ | ||
| # Generic label merge from control plane and agent namespaces with "or", | ||
| # resulting in: role in (control_plane_node,agent_node) | ||
| for key in keys(merge(local.labels_control_plane_node, local.labels_agent_node)) : | ||
| "${key} in (${ | ||
| join(",", compact([ | ||
| for labels in [local.labels_control_plane_node, local.labels_agent_node] : | ||
| try(labels[key], "") | ||
| ])) | ||
| })" | ||
| for key, value in local.labels_control_plane_node : | ||
| "${key}=${value}" | ||
| ] |
There was a problem hiding this comment.
This change correctly removes agent nodes from the Terraform-managed targets, fixing the 'registering twice' issue when CCM is active. However, it always adds control-plane nodes.
If var.allow_scheduling_on_control_plane is set to true, ingress controller pods can be scheduled on control-plane nodes. In this case, the Hetzner CCM will add them as targets, and this Terraform resource will also add them, re-introducing the 'registering twice' problem for control-plane nodes.
To make the fix more robust, we should only add control-plane nodes via Terraform when they are not expected to be managed by CCM. This happens when scheduling on them is disabled (the default behavior).
This suggestion makes the addition of control-plane nodes conditional on local.allow_scheduling_on_control_plane.
(
local.allow_scheduling_on_control_plane
? []
: [
for key, value in local.labels_control_plane_node :
"${key}=${value}"
]
)
There was a problem hiding this comment.
Control plane nodes are explicitly excluded from the load balancer here, so they won’t be registered twice.
There was a problem hiding this comment.
@mrjoe7 I see, but you did not address the specific concern. Sometimes we want the control planes to be included as targets when var.allow_scheduling_on_control_plane is true.
There was a problem hiding this comment.
@mysticaltech I was thinking about this and my question is: Should we even keep using the LB Label Selector?
To explain — membership in the LB Cloud Server group is controlled by node.kubernetes.io/exclude-from-external-load-balancers=true in the default_control_plane_labels. We then combine this with the Label Selector mechanism, effectively using two methods to achieve the same result, which leads to LB registering node twice. Maybe we should just rely solely on node labels to control the Hetzner LB targets?
There was a problem hiding this comment.
You have a point. I need to go back to how those label selector came about in the codebase in the first place, to really get to the bottom of it, this can be done via exploring the git history (with something like code claude). I will give it a shot ASAP, but don't hesitate to do the same.
There was a problem hiding this comment.
@mysticaltech have you had any success figuring out why it was designed this way?
|
@codex review please, and let me know if backward compatible or not. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| label_selector = join(",", concat( | ||
| [for k, v in local.labels : "${k}=${v}"], | ||
| [ | ||
| # Generic label merge from control plane and agent namespaces with "or", | ||
| # resulting in: role in (control_plane_node,agent_node) | ||
| for key in keys(merge(local.labels_control_plane_node, local.labels_agent_node)) : | ||
| "${key} in (${ | ||
| join(",", compact([ | ||
| for labels in [local.labels_control_plane_node, local.labels_agent_node] : | ||
| try(labels[key], "") | ||
| ])) | ||
| })" | ||
| for key, value in local.labels_control_plane_node : | ||
| "${key}=${value}" | ||
| ] |
There was a problem hiding this comment.
Cluster load balancer no longer targets agent nodes
The new label selector now appends only local.labels_control_plane_node, yielding role=control_plane_node and filtering out all agent nodes. Previously the selector matched both control plane and agent roles so the external load balancer attached to every worker node running ingress. After this change, multi-node clusters will register only control-plane servers and ingress traffic will bypass agents, which is a breaking change relative to prior behaviour.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Yes this is a change in behaviour, but I don't see any issue with it as k8s will automatically register nodes in loadbalancer. I have been testing this with my cluster for a month without any issues.
There was a problem hiding this comment.
But speaking of “no issues”… I tried changing the load balancer’s network settings and noticed it dropped all the targets from the LB. The targets matched by a label were added back automatically, but the ones registered by the Kubernetes CCM were not. I waited maybe a minute after manually adding the missing targets. Maybe they would have eventually been re-added automatically, but that would have caused an unnecessary outage and I didn’t want to wait.
So maybe we should keep the tags, but also add node.kubernetes.io/exclude-from-external-load-balancers=true to the worker nodes?
What do you think @mysticaltech ?
|
Can't merge this until it's all fixed please. |
78cca11 to
d45cc7b
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where load balancer targets were being registered twice. The fix involves adding the node.kubernetes.io/exclude-from-external-load-balancers=true label to agent nodes when a Hetzner Cloud Load Balancer is in use. This prevents the Cloud Controller Manager from automatically adding the nodes, which resolves the double registration problem. The implementation is correct, but I've pointed out a confusing inline comment that should be updated for better clarity and future maintainability.
d45cc7b to
47fc4ae
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue with load balancer targets being registered twice by conditionally adding an exclusion label to agent nodes. However, the implemented logic appears to be inverted. It would exclude agent nodes from being load balancer targets precisely when they are needed for ingress traffic, which would likely break the feature. I've left a critical review comment with a suggested code change to correct this logic. I also recommend renaming a confusingly named local variable for better code clarity in a future change.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where load balancer targets were being registered twice. The change modifies default_agent_labels in locals.tf to conditionally add the node.kubernetes.io/exclude-from-external-load-balancers=true label to agent nodes. This label is added when the cluster is configured to use an external Hetzner Load Balancer (i.e., when local.has_external_load_balancer is false).
My analysis suggests this is likely a workaround to prevent k3s's built-in servicelb from interfering with the Hetzner Cloud Controller Manager (CCM), which would explain the "registering twice" symptom. By adding this label, servicelb (if not fully disabled for some reason) would be instructed to ignore these agent nodes, leaving only the CCM to manage them as load balancer targets.
The implementation appears correct and the logic is consistent across different cluster configurations (external LB, klipper/servicelb, single-node). The code change is well-contained and should effectively resolve the described problem.

Fix for #1900