Kubeflow SDK is a unified Python SDK that streamlines the user experience for AI Practitioners to interact with various Kubeflow projects. It provides simple, consistent APIs across the Kubeflow ecosystem, enabling users to focus on building ML applications rather than managing complex infrastrutcure.
- Unified Experience: Single SDK to interact with multiple Kubeflow projects through consistent Python APIs
 - Simplified AI Workflows: Abstract away Kubernetes complexity, allowing AI practitioners to work in familiar Python environments
 - Seamless Integration: Designed to work together with all Kubeflow projects for end-to-end ML pipelines
 - Local Development: First-class support for local development requiring only 
pipinstallation 
pip install -U kubeflowfrom kubeflow.trainer import TrainerClient, CustomTrainer
def get_torch_dist():
    import os
    import torch
    import torch.distributed as dist
    dist.init_process_group(backend="gloo")
    print("PyTorch Distributed Environment")
    print(f"WORLD_SIZE: {dist.get_world_size()}")
    print(f"RANK: {dist.get_rank()}")
    print(f"LOCAL_RANK: {os.environ['LOCAL_RANK']}")
# Create the TrainJob
job_id = TrainerClient().train(
    runtime=TrainerClient().get_runtime("torch-distributed"),
    trainer=CustomTrainer(
        func=get_torch_dist,
        num_nodes=3,
        resources_per_node={
            "cpu": 2,
        },
    ),
)
# Wait for TrainJob to complete
TrainerClient().wait_for_job_status(job_id)
# Print TrainJob logs
print("\n".join(TrainerClient().get_job_logs(name=job_id)))Kubeflow Trainer client supports local development without needing a Kubernetes cluster.
- KubernetesBackend (default) - Production training on Kubernetes
 - ContainerBackend - Local development with Docker/Podman isolation
 - LocalProcessBackend - Quick prototyping with Python subprocesses
 
Quick Start:
Install container support: pip install kubeflow[docker] or pip install kubeflow[podman]
from kubeflow.trainer import TrainerClient, ContainerBackendConfig, CustomTrainer
# Switch to local container execution
client = TrainerClient(backend_config=ContainerBackendConfig())
# Your training runs locally in isolated containers
job_id = client.train(trainer=CustomTrainer(func=train_fn))| Project | Status | Version Support | Description | 
|---|---|---|---|
| Kubeflow Trainer | ✅ Available | v2.0.0+ | Train and fine-tune AI models with various frameworks | 
| Kubeflow Katib | 🚧 Planned | TBD | Hyperparameter optimization | 
| Kubeflow Pipelines | 🚧 Planned | TBD | Build, run, and track AI workflows | 
| Kubeflow Model Registry | 🚧 Planned | TBD | Manage model artifacts, versions and ML artifacts metadata | 
| Kubeflow Spark Operator | 🚧 Planned | TBD | Manage Spark applications for data processing and feature engineering | 
- Slack: Join our #kubeflow-ml-experience Slack channel
 - Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings
 - GitHub: Discussions, issues and contributions at kubeflow/sdk
 
Kubeflow SDK is a community project and is still under active development. We welcome contributions! Please see our CONTRIBUTING Guide for details.
- Design Document: Kubeflow SDK design proposal
 - Component Guides: Individual component documentation
 - DeepWiki: AI-powered repository documentation
 
We couldn't have done it without these incredible people: