Prevent unnecessary kubernetes client imports in workers #56692

kaxil · 2025-10-16T02:07:22Z

Workers no longer import the full kubernetes client library (~32-42 MB) when performing routine operations like secret masking and DAG serialization. The kubernetes client is only imported when actually processing kubernetes objects.

With the default 32 LocalExecutor workers, this could reduce memory usage by approximately 1 GB in deployments that don't all use k8s.

Part of #56641 (Kudos to @wjddn279 for investigation)

import sys
import tracemalloc

assert 'kubernetes' not in sys.modules

tracemalloc.start()
snapshot_before = tracemalloc.take_snapshot()

from kubernetes.client import V1EnvVar

snapshot_after = tracemalloc.take_snapshot()

top_stats = snapshot_after.compare_to(snapshot_before, 'traceback')
print("[ Top 10 differences ]")
for stat in top_stats[:10]:
    print(stat)

total = sum(stat.size_diff for stat in top_stats)
print(f"\nTotal memory increase: {total / 1024 / 1024:.2f} MB")

Output: Total memory increase: 41.62 MB

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

Workers no longer import the full kubernetes client library (~32-42 MB) when performing routine operations like secret masking and DAG serialization. The kubernetes client is only imported when actually processing kubernetes objects. With the default 32 LocalExecutor workers, this could reduce memory usage by approximately 1 GB in deployments that don't all use k8s. Part of apache#56641 ```py import sys import tracemalloc assert 'kubernetes' not in sys.modules tracemalloc.start() snapshot_before = tracemalloc.take_snapshot() from kubernetes.client import V1EnvVar snapshot_after = tracemalloc.take_snapshot() top_stats = snapshot_after.compare_to(snapshot_before, 'traceback') print("[ Top 10 differences ]") for stat in top_stats[:10]: print(stat) total = sum(stat.size_diff for stat in top_stats) print(f"\nTotal memory increase: {total / 1024 / 1024:.2f} MB") ``` Output: Total memory increase: 41.62 MB

The apache#56692 introduced optimization for PodGenerator imports - but there was a problem that when deserializing Pod it failed when no k8s classes were loaded - but it really is not optimisation but failure - nothing actually prevents us from importing the k8s classes and we actually **have to** do it in case we want to deserialize serialized Pod.

The apache#56692 introduced optimization for PodGenerator imports - but there was a problem that when deserializing Pod it failed when no k8s classes were loaded - but it really is not optimisation but failure - nothing actually prevents us from importing the k8s classes and we actually have to do it in case we want to deserialize serialized Pod. # Please enter the commit message for your changes. Lines starting

The #56692 introduced optimization for PodGenerator imports - but there was a problem that when deserializing Pod it failed when no k8s classes were loaded - but it really is not optimisation but failure - nothing actually prevents us from importing the k8s classes and we actually have to do it in case we want to deserialize serialized Pod. # Please enter the commit message for your changes. Lines starting * fixup! Skip PodGenerator import for deserialization when no k8s installed * fixup! fixup! Skip PodGenerator import for deserialization when no k8s installed --------- Co-authored-by: Kaxil Naik <[email protected]>

The apache#56692 introduced optimization for PodGenerator imports - but there was a problem that when deserializing Pod it failed when no k8s classes were loaded - but it really is not optimisation but failure - nothing actually prevents us from importing the k8s classes and we actually have to do it in case we want to deserialize serialized Pod. # Please enter the commit message for your changes. Lines starting * fixup! Skip PodGenerator import for deserialization when no k8s installed * fixup! fixup! Skip PodGenerator import for deserialization when no k8s installed --------- Co-authored-by: Kaxil Naik <[email protected]>

(cherry picked from commit 4926999)

The #56692 introduced optimization for PodGenerator imports - but there was a problem that when deserializing Pod it failed when no k8s classes were loaded - but it really is not optimisation but failure - nothing actually prevents us from importing the k8s classes and we actually have to do it in case we want to deserialize serialized Pod. # Please enter the commit message for your changes. Lines starting * fixup! Skip PodGenerator import for deserialization when no k8s installed * fixup! fixup! Skip PodGenerator import for deserialization when no k8s installed --------- Co-authored-by: Kaxil Naik <[email protected]> (cherry picked from commit 17037e6)

The apache#56692 introduced optimization for PodGenerator imports - but there was a problem that when deserializing Pod it failed when no k8s classes were loaded - but it really is not optimisation but failure - nothing actually prevents us from importing the k8s classes and we actually have to do it in case we want to deserialize serialized Pod. # Please enter the commit message for your changes. Lines starting * fixup! Skip PodGenerator import for deserialization when no k8s installed * fixup! fixup! Skip PodGenerator import for deserialization when no k8s installed --------- Co-authored-by: Kaxil Naik <[email protected]>

kaxil requested review from ashb and bolkedebruin as code owners October 16, 2025 02:07

boring-cyborg bot added the area:serialization label Oct 16, 2025

kaxil added this to the Airflow 3.1.1 milestone Oct 16, 2025

kaxil requested a review from potiuk October 16, 2025 02:07

potiuk approved these changes Oct 16, 2025

View reviewed changes

wjddn279 mentioned this pull request Oct 16, 2025

Fix memory leak in remote logging connection cache #56695

Merged

kaxil merged commit 4926999 into apache:main Oct 16, 2025
62 checks passed

kaxil deleted the skip-k8s-client-import branch October 16, 2025 10:40

snreddygopu pushed a commit to Teradata/airflow that referenced this pull request Oct 16, 2025

Prevent unnecessary kubernetes client imports in workers (apache#56692)

8172231

potiuk mentioned this pull request Oct 16, 2025

Skip PodGenerator import for deserialization when no k8s installed #56733

Merged

abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 17, 2025

Prevent unnecessary kubernetes client imports in workers (apache#56692)

a0ed499

abdulrahman305 bot pushed a commit to abdulrahman305/airflow that referenced this pull request Oct 19, 2025

Prevent unnecessary kubernetes client imports in workers (apache#56692)

7b60547

kaxil added a commit that referenced this pull request Oct 21, 2025

Prevent unnecessary kubernetes client imports in workers (#56692)

27ed449

(cherry picked from commit 4926999)

TyrellHaywood pushed a commit to TyrellHaywood/airflow that referenced this pull request Oct 22, 2025

Prevent unnecessary kubernetes client imports in workers (apache#56692)

7b416ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent unnecessary kubernetes client imports in workers #56692

Prevent unnecessary kubernetes client imports in workers #56692

kaxil commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prevent unnecessary kubernetes client imports in workers #56692

Prevent unnecessary kubernetes client imports in workers #56692

Conversation

kaxil commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants