release: promote staging to master for v3 by mysticaltech · Pull Request #2147 · mysticaltech/terraform-hcloud-kube-hetzner

mysticaltech · 2026-02-17T13:23:56Z

v3.0.0 Release PR: `staging` -> `master`

Executive Summary

This PR promotes the v3 line from staging to master.

It is a major release candidate, not an incremental feature merge. The train includes the original ideas_v3 work, the v3 migration contract, Terraform/OpenTofu validation hardening, first-class Tailscale node transport, clearer topology guidance, Cilium Gateway API support, embedded registry mirror support, and final release smoke gates.

No release tag is created by this PR.

Why This Is A Major Version

v3 contains deliberate breaking changes that require explicit operator intent:

Removed/renamed public inputs and consolidated older customization surfaces.
Terraform/OpenTofu minimums and hcloud provider minimums were raised.
Default behavior moved toward Leap Micro, per-nodepool networking, stricter validation, and larger-cluster topology modeling.
Networking, endpoint, nodepool, autoscaler, NAT/private paths, and addon orchestration have materially changed.
Tailscale node transport and multinetwork scale paths introduce new supported topologies that must be configured intentionally.

Current v3 Differentiators

Leap Micro-first kube-hetzner clusters with k3s and RKE2 support.
Terraform and OpenTofu module contracts with extensive cross-variable validation.
Tailscale as a supported secure node transport for single-network hardening and +100-node multinetwork scale.
Topology chooser and 10k-node reference examples that respect Hetzner Network and placement-group limits.
Cilium Gateway API as an opt-in first-class path.
Embedded k3s/RKE2 registry mirror as an opt-in large-cluster pull-pressure reducer.
Deterministic endpoint outputs for kubeconfig, node join path, transport mode, and Tailnet MagicDNS hostnames.

Core Changes By Theme

1) OS, Distribution, and Bootstrap

Leap Micro support and transactional-update persistence hardened.
RKE2 promoted as a first-class distribution path.
SELinux defaults and migration inversion documented and validated.
Host bootstrap, config updates, kured, and system-upgrade paths made distribution-aware.

2) Network, Endpoint, and Transport Topology

network_subnet_mode = "per_nodepool" is the v3 default for new clusters.
Existing/shared subnet behavior remains available for migration compatibility.
Tailscale node transport added for secure node connectivity and supported multinetwork scale.
Cilium public-overlay multinetwork remains documented as experimental rather than a production scale claim.
Endpoint behavior is documented for direct public, public LB, private LB/NAT, explicit endpoints, and Tailscale MagicDNS.

3) Nodepools, Autoscaler, and Placement Groups

Nodepool schemas expanded with per-node overrides, public network controls, extra networks, volumes, labels, taints, and map/count safety.
Autoscaler paths support per-network/Tailscale-aware rendering where configured.
Placement groups are sharded/validated around Hetzner's spread-group limits.
+100-node and 10k reference examples encode Hetzner Network attachment limits instead of pretending a single Network scales forever.

4) Addons and Kubernetes APIs

Cilium Gateway API support added via cilium_gateway_api_enabled.
Gateway API standard CRDs are installed automatically when Cilium Gateway API or Traefik Gateway provider mode is enabled.
Embedded registry mirror support added for k3s/RKE2 with safe merge behavior over user registries_config.
CCM/CSI/ingress/cert-manager/user-kustomization rendering tightened for v3 behavior.

5) Migration and Operator Experience

MIGRATION.md, docs/v2-to-v3-migration.md, README, kube.tf.example, examples, and skills updated for v3.
scripts/v2_to_v3_migration_assistant.py provides guided v2-to-v3 checks.
docs/v3-topology-recommendations.md is the topology chooser for new clusters.
Kube-hetzner skills now know the v3 migration, testing, topology, Tailscale, Gateway API, registry mirror, and release gates.

Important Supported/Unsupported Boundaries

Supported secure scale path: Tailscale node transport with per-network nodepools and route advertisement where needed.
Supported Gateway API path: Cilium with kube-proxy disabled and cilium_gateway_api_enabled = true.
Supported registry mirror path: opt-in embedded mirror for trusted-node clusters.
Not a Talos pivot: Talos remains a different project shape.
Not a public-network/IP-query-server scale story: v3 does not claim production +100 via public CNI overlay.
Cilium public overlay remains experimental until live-proven beyond static planning.

Validation Evidence For Latest Push

Latest pushed commit: 0c03f4b (chore: finalize v3 topology support boundaries).

Local gates run from /Volumes/MysticalTech/Code/kube-hetzner:

Latest doc/boundary commit rechecked with terraform fmt -check -recursive, terraform validate, temp-copy tofu init -backend=false && tofu validate, uv run scripts/validate_v3_final_polish_examples.py, uv run scripts/validate_tailscale_large_scale_examples.py, and git diff --check.
terraform fmt -recursive
terraform-docs markdown . > docs/terraform.md
no live null_resource / hashicorp/null provider usage remains
terraform init -backend=false -no-color
terraform validate -no-color
temp-copy tofu init -backend=false -no-color && tofu validate -no-color
git diff --check
uv run scripts/validate_tailscale_large_scale_examples.py
uv run scripts/validate_v3_final_polish_examples.py
uv run scripts/smoke_v3_plan_matrix.py

Example parse/validate gates:

kube.tf.example with local module source substitution
examples/argocd/main.tf with local module source substitution
examples/cilium-gateway-api/main.tf with local module source substitution
examples/tailscale-node-transport/main.tf with local module source substitution

Disposable plan matrix coverage:

default k3s + Cilium
Cilium Gateway API valid
Cilium Gateway API invalid with Flannel
Cilium Gateway API invalid with kube-proxy enabled
embedded registry mirror valid for k3s
embedded registry mirror valid for RKE2
embedded registry mirror invalid duplicate registries
embedded registry mirror invalid empty registry set
Tailscale + embedded registry + external network valid with private route advertisement
Tailscale + embedded registry + external network invalid without private route advertisement

Test workspace smoke:

/Volumes/MysticalTech/Code/kube-test
terraform init -upgrade -no-color
terraform plan -refresh=false -lock=false -input=false -no-color -detailed-exitcode
Result: valid create-only plan, 43 to add, 0 to change, 0 to destroy
Note: used a temporary local override to pin addon versions during this smoke because unauthenticated GitHub release API quota was exhausted; the override was removed after the plan.

Reviewer Guide

Suggested high-signal review order:

variables.tf
validation-locals.tf
locals.tf
tailscale.tf
init.tf
control_planes.tf
agents.tf
autoscaler-agents.tf
data.tf
output.tf
README.md
kube.tf.example
MIGRATION.md
docs/v3-topology-recommendations.md
examples/tailscale-node-transport/
examples/cilium-gateway-api/
.claude/skills/*/SKILL.md

Release Intent

Merge this PR only when v3 is ready to become the master-line release candidate. Tagging/publishing remains a separate maintainer action after final review.

…s disabled

tiran133 · 2026-04-26T06:21:24Z

Hi
I was playing around with this. What I notice is that all server control plane and agent nodes end up in the same subnet
even though there is agent subnet and control plane subnet.
I changed the cidr just for a test bit its the same with the default 10.0.0.0/8

network_ipv4_cidr = "10.0.0.0/16"
subnet_amount     = 256

Here is my kube.tf

kube.tf

locals {
  # You have the choice of setting your Hetzner API token here or define the TF_VAR_hcloud_token env
  # within your shell, such as: export TF_VAR_hcloud_token=xxxxxxxxxxx. Or you can use .tfvars-files.
  # If you choose to define it in the shell, this can be left as is.

  # Your Hetzner token can be found in your Project > Security > API Token (Read & Write is required).
  hcloud_token = ""

  # Credentials for the Hetzner Robot webservice
  robot_user     = ""
  robot_password = ""

  etcd-s3-endpoint        = "fsn1.your-objectstorage.com"
  etcd-s3-access-key      = ""
  etcd-s3-secret-key      = ""
  etcd-s3-bucket          = "backups-01"
  etcd-s3-region          = "fns1"
  etcd-s3-folder          = "k3s-etcd-snapshots"

  longhorn_volume_size            = 200
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
  robot_user     = var.robot_user != "" ? var.robot_user : local.robot_user
  robot_password = var.robot_password != "" ? var.robot_password : local.robot_password

  kubernetes_distribution_type = "rke2"
  # network_subnet_mode = "per_nodepool"
  network_subnet_mode = "legacy"

  network_ipv4_cidr = "10.0.0.0/16"
  subnet_amount     = 256

  # source = "kube-hetzner/kube-hetzner/hcloud"
  source = "../terraform-hcloud-kube-hetzner"

 
  ssh_public_key = file("~/.ssh/id_ed25519.pub")
  ssh_private_key = null # Use agent
  network_region = "eu-central" # change to `us-east` if location is ash
  control_plane_nodepools = [
    {
      name        = "control-plane-nbg1",
      server_type = "cpx32",
      location    = "nbg1",
      labels      = [
        "node.kubernetes.io/role=egress",
      ],
      taints      = [],
      count       = 1
      swap_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      zram_size   = "2G" # remember to add the suffix, examples: 512M, 1G
    },
    {
      name        = "control-plane-fsn1",
      server_type = "cpx32",
      location    = "fsn1",
      labels      = [
        "node.kubernetes.io/role=egress",
      ],
      taints      = [],
      count       = 1
      swap_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      zram_size   = "2G" # remember to add the suffix, examples: 512M, 1G
    },
    {
      name        = "control-plane-hel1",
      server_type = "cpx32",
      location    = "hel1",
      labels      = [
        "node.kubernetes.io/role=egress",
      ],
      taints      = [],
      count       = 1
      swap_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      zram_size   = "2G" # remember to add the suffix, examples: 512M, 1G
    }
  ]

  agent_nodepools = [
    {
      name        = "agent-medium",
      server_type = "ccx23",
      location    = "nbg1",
      labels      = [
        "node.kubernetes.io/role=worker",
        "node.longhorn.io/create-default-disk=config",
        "storage-type=fast"
      ],
      taints      = [],
      count       = 3
      swap_size   = "2G" # remember to add the suffix, examples: 512M, 1G
      zram_size   = "2G" # remember to add the suffix, examples: 512M, 1G
    },
    {
      name        = "storage",
      server_type = "ccx23",
      location    = "nbg1",
      labels      = [
        "storage-type=capacity",
        "node.kubernetes.io/role=longhorn-storage",
        "node.longhorn.io/create-default-disk=config",
      ],
      taints      = [],
      count       = 3
      attached_volumes = [
        {
          size       = local.longhorn_volume_size
          mount_path = "/var/longhorn"
          filesystem = "ext4"  # ext4 or xfs
        }
      ]
    },
  ]

  control_planes_custom_config = {
   etcd-expose-metrics = true,
   kube-controller-manager-arg = "bind-address=0.0.0.0",
   # kube-proxy-arg ="metrics-bind-address=0.0.0.0",
   kube-scheduler-arg = "bind-address=0.0.0.0",
  }
  enable_wireguard = true
  load_balancer_type     = "lb11"
  load_balancer_location = "nbg1"
  etcd_s3_backup = {
    etcd-s3-endpoint        = local.etcd-s3-endpoint
    etcd-s3-access-key      = local.etcd-s3-access-key
    etcd-s3-secret-key      = local.etcd-s3-secret-key
    etcd-s3-bucket          = local.etcd-s3-bucket
    etcd-s3-region          = local.etcd-s3-region
    etcd-s3-folder          = local.etcd-s3-folder
  }

  csi_driver_smb_version = "v1.20.1"
  hetzner_ccm_use_helm = true
  ingress_controller = "none"
  system_upgrade_use_drain = true

  install_rke2_version = "v1.34.6+rke2r3"
  initial_k3s_channel = "v1.34"
  cluster_name = "rke2-cluster"
  firewall_kube_api_source   = ["2.2.2.2","1.1.1.1"] # Dummy
  firewall_ssh_source        = ["2.2.2.2","1.1.1.1"]
  cni_plugin = "cilium"

  cilium_merge_values = <<EOT
lbIPAM:
  enabled: false
EOT


  cilium_routing_mode = "native"
  cilium_egress_gateway_enabled = true
  cilium_hubble_enabled = true
  cilium_hubble_metrics_enabled = [
    "policy:sourceContext=app|workload-name|pod|reserved-identity;destinationContext=app|workload-name|pod|dns|reserved-identity;labelsContext=source_namespace,destination_namespace"
  ]
  cilium_loadbalancer_acceleration_mode = "native"
  disable_kube_proxy = true
  enable_cert_manager = false
  dns_servers = [
    "1.1.1.1",
    "9.9.9.9",
    "2606:4700:4700::1111"
  ]
}

provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}

terraform {
  required_version = ">= 1.10.1"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.51.0"
    }
    deepmerge = {
      source  = "isometry/deepmerge"
      version = "= 1.2.1"  # or whatever version worked before
    }
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

variable "hcloud_token" {
  sensitive = true
  default   = ""
}

variable "robot_user" {
  sensitive = true
  default   = ""
}

variable "robot_password" {
  sensitive = true
  default   = ""
}

output "k3s_token" {
  value     = module.kube-hetzner.k3s_token
  sensitive = true
}

I also tried

network_subnet_mode = "per_nodepool"

Or am I missing something?

I mean the cluster works fine I guess. But I just don't understand the different subnets I guess

If so can you explain?

feat: drain v3 issue and PR queue

Add the v3 module contract, plan-time validations, Cilium multinetwork overlay support, OpenTofu gates, migration docs/skills, refreshed examples, and release-ready documentation. Keep Tailscale support as a blessed external overlay pattern through generic connection hooks instead of provider-specific core lifecycle management.

mysticaltech added 30 commits February 17, 2026 11:24

Add combine_load_balancers option for shared LB

4ce0b04

Add Velero HelmChart support with S3 settings

5c0e2dd

Make kubelet config updates distribution-aware

3d55d1f

Add IPv6 pod/service CIDR support in control plane config

61a56cf

Document SELinux relabel timeout workaround

44553cf

Delete non-selected ingress HelmCharts during apply

8729102

Add system-namespace toggle for ingress controllers

b7d7c36

Export full control plane and agent module maps

c4a0b10

Add merge-value support for Hetzner CSI and SMB CSI

b94297b

feat: add optional csi-driver-nfs chart support

a6cbadf

feat: support map-based control plane node overrides

c9f2164

feat: add longhorn volume support for control planes

70fb46e

feat: set fail-swap-on=false when swap is configured

af5f5d4

feat: make kubernetes api port configurable

56217af

feat: allow reusing an existing egress floating ip

7191e71

feat: migrate kured to helmchart values configuration

949893f

feat: allow disabling random node name suffixes

36e8251

feat: add configurable server create timeout

c9aa641

feat: optionally trigger kured sentinel for config updates

0541ded

feat: default firewall ssh source to myipv4

d4b1edf

feat: add external node join helper output

eb09f91

feat: add optional load balancer monitoring manifests

7104195

feat: support extra network and firewall attachments

0a86922

feat: add optional secrets encryption configuration

b81f444

refactor: use shared subnet with optional private ipv4

0e02b96

feat: add optional primary ip pool for nodes

aa4b8ec

feat: add optional cilium egress gateway ha reconciler

581b663

feat: add attached_volumes support to nodepools

2076d51

refactor: always enable iscsid and remove toggle

33eb0c0

feat: expose autoscaler metrics with optional firewall allowlist

f0194b2

mysticaltech added 13 commits April 25, 2026 23:42

Merge PR #2160 history into v3 train 2 temp

0fd78c4

fix: redeploy user kustomizations after control plane replacement

ff481cb

Integrate PR #2160 for v3 train 2

d993f90

fix: skip managed ingress wait for custom controller

915f2ce

fix: route autoscaler IPv4 through private network when public IPv4 i…

d17f03b

…s disabled

feat: align Cilium and CCM defaults with v3 dual-stack

f34f2a4

fix: normalize empty per-node snapshot overrides

5b6de47

docs: record v3 train 2 intake decisions

aa30443

fix: handle null snapshot overrides in plans

fe7bad2

fix: address v3 intake review findings

c7299fe

fix: sequence nat router custom commands after hardening

e1b97d0

fix: restore nat router hardening dependency

df3bd9e

fix: harden v3 intake review edge cases

51d5eec

mysticaltech mentioned this pull request Apr 26, 2026

feat: drain v3 issue and PR queue #2185

Merged

mysticaltech and others added 15 commits April 26, 2026 09:42

fix: close v3 intake review gaps

bf2e690

Merge pull request #2185 from mysticaltech/codex/v3-train-2-intake

5b6efab

feat: drain v3 issue and PR queue

fix: close staging review blockers

73a810f

chore: finalize kube-hetzner v3 release readiness

88bb5f8

fix: mask health-checker before cloud-init final

f1a7b5d

fix: harden cilium multinetwork bootstrap

c909c18

fix: gate experimental cilium multinetwork preview

d6f81e5

chore: harden v3 release smoke readiness

efe0a08

chore: tighten v3 release smoke gates

d4caf85

chore: finalize v3 topology support boundaries

0c03f4b

fix: support Terraform 1.15 validation

a1edbd6

fix: make Tailscale network intent plan-known

13e84db

fix: stabilize v3 live release smokes

e6c8b02

docs: add k3s certificate expiry debugging note

810a884

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

release: promote staging to master for v3#2147

release: promote staging to master for v3#2147
mysticaltech wants to merge 412 commits intomasterfrom
staging

mysticaltech commented Feb 17, 2026 •

edited

Loading

Uh oh!

tiran133 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

mysticaltech commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v3.0.0 Release PR: staging -> master

Executive Summary

Why This Is A Major Version

Current v3 Differentiators

Core Changes By Theme

1) OS, Distribution, and Bootstrap

2) Network, Endpoint, and Transport Topology

3) Nodepools, Autoscaler, and Placement Groups

4) Addons and Kubernetes APIs

5) Migration and Operator Experience

Important Supported/Unsupported Boundaries

Validation Evidence For Latest Push

Reviewer Guide

Release Intent

Uh oh!

tiran133 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mysticaltech commented Feb 17, 2026 •

edited

Loading

v3.0.0 Release PR: `staging` -> `master`