Skip to content

WIP feat: add slurm cluster autodiscovery#11

Open
gregorweiss wants to merge 7 commits into
mainfrom
worktree-slurm-cluster-autodiscovery
Open

WIP feat: add slurm cluster autodiscovery#11
gregorweiss wants to merge 7 commits into
mainfrom
worktree-slurm-cluster-autodiscovery

Conversation

@gregorweiss
Copy link
Copy Markdown
Collaborator

@gregorweiss gregorweiss commented May 18, 2026

Adds mdfactory/performance/cluster.py — queries sinfo and sacctmgr to discover cluster resources automatically. This is the foundation for all planned HPC features (benchmarking, node packing, GPU sharing).

What it does

  • discover_cluster() → returns partitions, node specs (CPUs, memory, GPUs), accounts, and QOS policies as frozen dataclasses
  • select_partition(cluster, needs_gpu=True, min_cpus=64) → picks the best partition for given requirements
  • Returns None gracefully on non-SLURM machines
  • Results cached per session (topology doesn't change mid-run)

Files

  • mdfactory/performance/__init__.py — new package
  • mdfactory/performance/cluster.py — autodiscovery implementation
  • mdfactory/tests/test_cluster.py — unit tests with mocked SLURM output

Test plan

  • pytest mdfactory/tests/test_cluster.py -v (mocked, no cluster needed)
  • Verify discover_cluster() returns None on laptop
  • Verify populated ClusterInfo on a real SLURM node

@gregorweiss gregorweiss self-assigned this May 18, 2026
@gregorweiss gregorweiss added the enhancement New feature or request label May 18, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 18, 2026

CLA assistant check
All committers have signed the CLA.

@gregorweiss gregorweiss linked an issue May 18, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement SLURM cluster autodiscovery

2 participants