Topology-aware, gang-friendly GPU scheduler for Kubernetes with atomic device leasing, claim-based APIs, and automatic environment injection.
- Smart GPU Allocation: Atomic device locking prevents double-booking
- Topology Awareness: Prefers GPUs with fast interconnects (NVLink)
- Declarative API: Define GPU needs with
GpuClaimresources - Automatic Setup: Webhook injects
CUDA_VISIBLE_DEVICESautomatically - Gang Scheduling: Multi-pod coordination (planned)
# 1. Install
helm install gpu-scheduler charts/gpu-scheduler -n gpu-scheduler
# 2. Create a GPU claim
kubectl apply -f - <<EOF
apiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
name: single-gpu
spec:
devices:
count: 1
exclusivity: Exclusive
EOF
# 3. Run a pod
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
namespace: gpu-scheduler
annotations:
gpu.scheduling/claim: single-gpu
spec:
schedulerName: gpu-scheduler
restartPolicy: Never
containers:
- name: cuda-test
image: nvidia/cuda:12.4.1-runtime-ubuntu22.04
command: ["bash","-lc","echo CVD=$CUDA_VISIBLE_DEVICES; nvidia-smi -L && nvidia-smi --query-gpu=index,name,memory.total --format=csv"]
resources:
limits:
nvidia.com/gpu: "1"
EOF
# 4. Check results
kubectl get gclaim -n gpu-scheduler
NAME AGE
single-gpu 26m
kubectl get gns -n gpu-scheduler -o wide
NAME AGE
aaron-lgt 20m
k logs gpu-test -n gpu-scheduler
CVD=0
GPU 0: NVIDIA GeForce RTX 5080 (UUID: GPU-7fb26e77-c956-a86e-2e85-a51d8abbb7c5)
index, name, memory.total [MiB]
0, NVIDIA GeForce RTX 5080, 16303 MiB- Kubernetes: Built and validated against 1.33.x (active support until mid‑2026). Smoke tests still run against 1.32/1.31, but older clusters are best-effort only.
- Go toolchain: Requires Go 1.24.x to compile all components. Earlier releases are unsupported because the Kubernetes 1.33 libraries demand Go 1.24 or newer.
- Architecture - How the system works under the hood
- Usage Guide - Installation, examples, and troubleshooting
- Webhook Certificates - TLS certificate setup for admission webhook
- RBAC Configuration - ServiceAccount and permissions setup
- Development Guide - Building, testing, and contributing
- API Reference - Complete API documentation
The system has three components working together:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Scheduler │ │ Webhook │ │ Agent │
│ (Plugin) │ │ (Mutator) │ │ (DaemonSet) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────────┴────────────────────┘
│
Kubernetes API Server
- Scheduler: Allocates GPUs using atomic lease-based locking
- Webhook: Injects
CUDA_VISIBLE_DEVICESenvironment variable - Agent: Reports GPU inventory and topology from each node
Key Innovation: Uses Kubernetes Coordination Leases for atomic GPU allocation - preventing race conditions and double-booking.
See Architecture for detailed explanation.
# Build scheduler
make docker
# Build webhook
make docker-webhook
# Build agent
make docker-agentSee Development Guide for local development setup.
apiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
name: single-gpu
spec:
devices:
count: 1
exclusivity: ExclusiveapiVersion: gpu.scheduling/v1
kind: GpuClaim
metadata:
name: training-gpus
spec:
devices:
count: 4
policy: contiguous
exclusivity: Exclusive
topology:
mode: Preferred
minBandwidthGBps: 600More examples in Usage Guide.
Implemented:
- ✅ GpuClaim and GpuNodeStatus CRDs
- ✅ Scheduler plugin with lease-based allocation
- ✅ Webhook for CUDA_VISIBLE_DEVICES injection
- ✅ Agent skeleton (reports placeholder data)
- ✅ Helm chart for deployment
TODO:
- ⏳ NVML integration in agent
- ⏳ Topology scoring in scheduler
- ⏳ Node selector enforcement
- ⏳ Gang scheduling support
- ⏳ Automatic lease cleanup
- ⏳ Fabric-aware scheduling (NVLink, InfiniBand/GPUDirect, cuFile). See docs/todo.md for the evolving plan.
See Development Guide.
This project is licensed under the MIT License - see the LICENSE file for details.
